- The Model
- The Present and the Future
- GDP Code and Documentation
- Possible Student Projects
This project covers the core GDP functionality. In particular, this is home for the client-side C library, various language bindings around the C-library, log-server, various utilities around it, the GDP router, and some little projects related to the Urban Heartbeat. We hope to eventually clean it up and migrate components other than the core functionality (such as Urban Heartbeat stuff) to their own independent projects. For an overview, see wiki.
As we discuss in a recent paper: The Cloud is not Enough: Saving IoT from the Cloud, the widespread practice of constructing Swarm applications by directly connecting with the cloud comes with a variety of downsides. With the GDP, we seek an infrastructure that enables important new use-cases for the cloud while still integrating smoothly with existing Cloud infrastructure.
The Global Data Plane (GDP) provides a data-centric glue for swarm applications. The basic primitive is that of a secure single-writer append-only log known as a DataCapsule. Data inputs are signed, timestamped, and sorted by timestamp. Data can be securely committed to the log in a variety of ways, including via a external consistent transactional model. Data within the log can be read (either randomly or by subscription), thereby permitting a variety of data models, including (eventually) a SQL query model. Further, data within a log can be preserved for the long term.
- A Case for the Universal DataPlane
- Another overview, largely similar to this one
- Global Data Plane: A Federated Vision for Secure Data in Edge Computing (IEEE International Conference on Distributed Computing Systems, 2019)
- Towards a Global Data Infrastructure (IEEE Journal on Internet Computing, 2016)
The GDP consists of append-only logs ("DataCapsules") and a routing layer. Each log is named by an opaque 256-bit number (a "GDPname") which is a SHA-256 of the metadata associated with the log; in particular, it has no direct connection with the location of the data. Logs are append-only and consist of a series of records; records consist of a record number, a commit timestamp (for coherency), variable-sized opaque data, and a proof of correctness (often a digital signature). It is important that the data be opaque since (in the longer term) all data should be encrypted, and the GDP will not hold the keys. Logs can be replicated and migrated.
The metadata consists of an ordered list of key-value pairs. One of the entries is always the public half of a key pair. Thus, an application knowing the name of a log can verify that the metadata has not been corrupted, and from that can verify signatures on the data contained in the body of the log.
Routing allows any node in the GDP to find at least one copy of any named entity (with the GDPname). Entities can be logs, but ultimately they may include services and users (so there is one shared namespace for everything). Entities that are not logs (sometimes called "Agents") also have a key pair and associated metadata, so an application can verify that the service is real as well.
The GDP implements mechanism, but not policy, which is mediated by a separate Control Plane. For example, the GDP handles the mechanics of replication and migration, but the choice of when and where to replicate is made by a higher level service that resides in the Control Plane on the basis of on-the-fly performance monitoring or other criteria.
Access control is based on cryptography. Write (append) access control is based on valid writers signing the message, with the GDP itself holding the public keys of authorized writers for verification. Read/Subscribe access control actually does not exist; the privacy of the data depends on the data being encrypted.
Higher level services may be layered on the GDP; for example, a service might combine the results of multiple logs into another log, or copy log data to specialized databases. In these cases the original logs are the "base truth", with everything else being a form of cache. Note however that these services are not part of the GDP, but rather are users of it.
The GDP needs to be self-healing and resistant to attack. Network partitions might in severe cases result in inaccessible data, but single node failure should not, and in no case should the GDP suffer catastrophic failure, even in the face of some nodes being compromised.
The Present and the Future¶
The prototype implementation of the GDP has limited functionality: a single instance daemon with basic read/subscribe/publish primitives. Data is completely opaque (this is a feature, since the intent is that all data will be encrypted), and the only metadata is a record number and a commit timestamp. There is no access control on either read or write.
There are several short- or medium-term projects that are either in progress or will start soon. See https://gdp.cs.berkeley.edu/redmine/projects/gdp/wiki/GDP_Task_List for details.
GDP Code and Documentation¶
The Global Dataplane Code initial prototype (client libraries and log server) is available on the U.C. Berkeley EECS repository at one of these URIs:
- https://repo.eecs.berkeley.edu/git-anon/projects/swarmlab/gdp.git (anonymous)
- git://repo.eecs.berkeley.edu/projects/swarmlab/gdp.git (anonymous)
If you have an account on repo.eecs.berkeley.edu you can also use:
The GDP router is available at:
Possible Student Projects¶
See GDP Project List for summaries of possible projects that should be "student sized".
- Subprojects: GDP Interfaces
GDP Infrastructure going down again
PG&E is cutting power to campus later today — all GDP infrastructure will be down for at least the rest of the weekend.
GDP infrastructure back up after power outage
Campus has power again, and the GDP infrastructure appears to be up and functional
GDP 2.1.19: changed default for HONGD availability
Default has changed so that HONGD is required by default in 2.1.19