Draft article

A new software stack for personal clouds

Dr Amir Chaudhry

A collection of open-source projects in the Systems Research Group is reinventing how developers create and deploy software for the 21st Century. This comes at a critical time when huge numbers of devices are becoming connected to the Internet and promising to improve our lives even more than the World Wide Web has done. Much of these advances are taking place under the banners of Cloud Computing and the Internet of Things, yet few people have taken a critical look at the systems we build these services with and the trade-offs that are made. As we move into a world of almost ubiquitous computing, it's time to reconsider the old assumptions and ensure that developers of the future can create resilient, secure and maintainable products that put users in control. In this article, I'll cover a few of the technologies we're working and provide links so you can delve deeper for the things that interest you. Detail on the overall projects is at nymote.org and openmirage.org.

The Internet and the trade-offs so far.

The Internet has become such a huge part of our daily lives to the extent that it's difficult to imagine a world without information and connectivity literally) at your fingertips. Collaboration is easier, communication simpler and generally work is more productive than it's ever been. For example, companies such as Apple, Google and Microsoft provide services that manage our emails, while reducing spam, and seamlessly keep our mobile devices in sync for us. Services such as Dropbox and Google Drive make the process of collaborating in teams trivially simple. Social services like Facebook and Twitter make it fun and easy to keep up to date with friends, while LinkedIn helps you to manage your professional persona and visualise your network. All of these companies are successful because they provide something of value to their users.

However, we've also made sacrifices in order to achieve these gains, in many cases without fully comprehending the consequences. By adopting large centralised services we've made an implicit trade that in exchange for something useful, we will share our habits and data with them. In doing so we've empowered Internet behemoths while simultaneously reducing our ability to influence them. We are at the behest of ever-changing Terms of Service, which are not written in our best interest. We trust services to secure our data but repeated breaches undermine that trust. Services that we come to depend on are shut down, forcing us to look elsewhere and adapt our habits.

As things stand, we are fundamentally limited by the current system unless developers can create decentralised alternatives that compete. Many of the services we enjoy today require a constant connection to the Internet in order to be useful. This is becoming more important as the current wave of Internet of Things devices find their way into our homes, with each of them tethered to a new wave of centralised services and any interruption in power or bandwidth having tangible effects on our lives. Decentralised solutions are more robust in the long term as users can control when to change or update things. Such systems allow users to maintain the benefits of a networked world as well as achieve life-long control of their data. Just as the current incarnation of the Internet is built on a range of open-source technologies, we need a set of robust tools that allow the developers of the future to build distributed systems and services that empower users. Specifically, to enable the creation of resilient decentralised systems that incorporate privacy from the ground up so that users retain control of their networks and data. Promises of "your data is safe with us" need to be underpinned by a technological foundation, which reduces the scope for human error and makes it easy to build scalable systems.

The fundamental infrastructure needs to address three critical problems: (1) managing the lifecycle of applications, from development to maintenance, in highly diverse environments; (2) persisting data and sharing it across systems; (3) providing identity and allowing users to create a personal network of their own devices.

A new toolstack for a distributed world

We're working on each of the problems described above and in the rest of this article I'll cover three emerging technologies, which together form a new toolstack for the personal Internet. First will be Mirage OS, which takes a clean-slate approach to the operating system and how we build applications. The second is Irmin, which rethinks how we persist data based on the principles of version control systems and the need for sync. Finally there is Signpost, which is a nascent project that will tackle issues around identity and connectivity. These emerging tools form the bedrock on which anyone can build robust and scalable applications. Applications that provide all the great things we're used to with the additional benefits of resilience, ownership and privacy that come with decentralised networks.

Mirage OS to write and maintain distributed applications.

Most applications that run in the cloud aren't optimised to do so. They inherently carry assumptions about the underlying operating system with them, including vulnerabilities and bloat. Compartmentalisation of large servers into smaller virtual machines has enabled many new businesses to get started and achieve scale. This has been great for new services but many of those virtual machines are single-purpose, yet they contain largely complete operating systems which themselves run applications like web-servers. This means a large part of the footprint is unused and unnecessary, which is both costly and a security risk (due to the larger attack surface). Moreover, as embedded devices become more important we cannot apply the same, heavyweight approaches to these constrained environments.

Mirage OS revisits earlier concepts from the Lab of the 'library operating system', where only the necessary components of the OS are included and compiled along with the application into something we dub a 'unikernel'. These are highly efficient and extremely lean virtual machines, with a much smaller attack surface. Mirage works by treating the Xen hypervisor (itself an alumnus of the Lab, which powers Amazon EC2 and Rackspace) as a stable hardware platform. This allows us to focus on high-performance protocol implementations without worrying about supporting the thousands of device drivers found in a traditional OS. As such, Mirage OS includes clean-slate functional implementations of protocols ranging from TCP/IP, DNS, SSH, Openflow (switch/controller), HTTP, XMPP and Xen inter-VM transports.

Unikernels made this way can be deployed directly to both cloud and embedded devices, with the benefits of reduced costs, increased security and scalability. In many cases, unikernels themselves can be tracked in version control systems, which provides a very convenient and familiar workflow when it comes to deployments. In this way, managing the lifecycle of an application is no more complex than managing source code.

Application code is developed in a high-level functional programming language (OCaml) and there are now over 70 libraries which map directly to operating system constructs when being compiled for production deployment. An increasing number of these are developed by a growing open-source community and the project has achieved some major milestones in recent years, including being incubated by the Linux Foundation (via Xen Project), achieving a major 2.0 release last summer and fostering a vibrant open-source community.

As one of the more mature projects, there is a lot more information on the completely self-hosted website (openmirage.org), including a list of papers, an up to date blog and installation instructions. The quickest way to get started is to build your personal website as a unikernel, which many people have now done. Using these approaches allows us to deploy and manage services that are designed to be small, lean and secure. By turning software into small unikernels, Mirage OS can take full advantage of existing infrastructure while dramatically reducing costs and simultaneously increasing security and scalability.

Since Mirage OS takes care of how we create applications, the next item is how to persist data such that we can track provenance and sync remote stores. This is the domain of Irmin.

Irmin to persist and sync data with full provenance

Having multiple devices has made the engineering effort around data persistence and sync more complex. We have to concern ourselves about how data obtained for one device is made available and useful to others, without losing history. Many third-party services have arisen to tackle this specific issue yet most of them rely on the premise of having one canonical location from which all other devices take their cues. This is invariably somewhere up in the cloud and is subject to the whims of the providers.

Irmin is a new kind of library database that takes the principles behind tools like Git and applies them to the wider problem of storing and syncing our data Specifically, it's a collection of libraries designed to solve different flavours of the challenges raised by the CAP theorem. Each application can select the right combination of features to solve its particular problem. Irmin enables a fully distributed workflow, with support for disconnected operation, efficient merge operations and application-specific conflict resolution algorithms. In effect, this means that all history can be kept and moved between devices with ease and it offers a way to circulate and integrate data among remote workers, sensors or devices in different connectivity environments.

Currently in beta release, Irmin already has a number of early adopters including Xenstore, where it's been used to add fault-tolerance. You can find more details at the following links:

Introducing Irmin: http://openmirage.org/blog/introducing-irmin
Adding fault-tolerance to Xenstore: http://openmirage.org/blog/introducing-irmin-in-xenstore

Taken together with Mirage OS, we can build applications across many devices and easily sync data between them. The remaining piece of the puzzle is the practical matter of connecting and identifying our personal cloud of devices, which is where Signpost comes in.

Signpost to create a personal network of devices

Forming efficient connections between end-points is becoming more important as the number of devices we own increases. These devices need to be able to identify and reach each-other, regardless of their location on the network or the obstacles in between. Peer-to-peer services are notoriously hard to use due to the prevalence of firewalls and NAT middleboxes that prevent all of our devices from seeing each other. Many cloud services have sprung up to assist with this, but they all require you to give access to your personal infrastructure to an external third-party.

Signpost is a nascent project that will tackle the problem by probing connectivity and providing a constant "pointer" to your devices. In effect, creating 'personal cloud' infrastructure to let your devices reach each other securely without requiring any complex configuration. In addition, it will be able to understand the internal structure of networks (e.g. the home) and directly connect devices that are on the same network without routing via the Internet. This means tackling questions around identity, security and routing, which are challenging problems in their own right.

This project is in its early phases and we've presented use cases and designs at the Usenix Security workshop on Free and Open Communications on the Internet (FOCI). As the work begins to develop, and in combination with Mirage OS and Irmin, each of us will be able to move away from the centralised providers and begin to claim our own piece of the Internet.

FOCI paper: http://nymote.org/docs/2013-foci-signposts.pdf

Nymote - putting the pieces together

Each of the above projects is significant in its own right but when used together, we will be able to build robust applications that will push towards a more decentralised web and empower new kinds of innovation and exploration. Building on the right foundations, means the core aspects of managing services and syncing between devices will already be accounted for so the focus can shift to functionality and extensibility.

The initial applications we're working towards include the triumvirate of Mail, Contacts and Calendar management. These three applications sit at the core of our work and personal lives yet they are routinely owned by other companies and have seen limited advances. Many services have attempted to solve specific pieces of these problems, for example online social networks might replace contact management, but it becomes very obvious that the three elements are regularly used together - consider messaging friends to schedule an event. Once each of us can fully own and easily maintain this kind of infrastructure, then it can become a platform for other developers to build on while putting the end-users, us, at the centre of our digital networks.

To keep up with this work, you can visit the Nymote website (http://nymote.org) and if you'd like to contribute to the underlying technology then look into Mirage OS (openmirage.org)!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly