Conference Papers

Gonçalves R, Almeida PS, Moreno CB, Fonte V.  2017.  DottedDB: Anti-Entropy without Merkle Trees, Deletes without Tombstones. 36th IEEE International Symposium on Reliable Distributed Systems . Abstractdotteddb_srds.pdf

To achieve high availability in the face of network partitions, many distributed databases adopt eventual consistency, allow temporary conflicts due to concurrent writes, and use some form of per-key logical clock to detect and resolve such conflicts. Furthermore, nodes synchronize periodically to ensure replica convergence in a process called anti-entropy, normally using Merkle Trees. We present the design of DottedDB, a Dynamo-like key-value store, which uses a novel node-wide logical clock framework, overcoming three fundamental limitations of the state of the art: (1) minimize the metadata per key necessary to track causality, avoiding its growth even in the face of node churn; (2) correctly and durably delete keys, with no need for tombstones; (3) offer a lightweight anti-entropy mechanism to converge replicated data, avoiding the need for Merkle Trees. We evaluate DottedDB against MerkleDB, an otherwise identical database, but using per-key logical clocks and Merkle Trees for anti-entropy, to precisely measure the impact of the novel approach. Results show that: causality metadata per object always converges rapidly to only one id-counter pair; distributed deletes are correctly achieved without global coordination and with constant metadata; divergent nodes are synchronized faster, with less memory-footprint and with less communication overhead than using Merkle Trees.

Almeida PS, Moreno CB, Fonte V, Gonçalves R.  2015.  Concise Server-Wide Causality Management for Eventually Consistent Data Stores. DAIS - IFIP International Conference on Distributed Applications and Interoperable Systems. Abstractglobal_logical_clocks.pdf

Large scale distributed data stores rely on optimistic replication to scale and remain highly available in the face of network partitions. Managing data without coordination results in eventually consistent data stores that allow for concurrent data updates. These systems often use anti-entropy mechanisms (like Merkle Trees) to detect and repair divergent data versions across nodes. However, in practice hash-based data structures are too expensive for large amounts of data and create too many false conflicts. Another aspect of eventual consistency is detecting write conflicts. Logical clocks are often used to track data causality, necessary to detect causally concurrent writes on the same key. However, there is a nonnegligible metadata overhead per key, which also keeps growing with time, proportional with the node churn rate. Another challenge is deleting keys while respecting causality: while the values can be deleted, perkey metadata cannot be permanently removed without coordination. We introduce a new causality management framework for eventually consistent data stores, that leverages node logical clocks (Bitmapped Version Vectors) and a new key logical clock (Dotted Causal Container) to provides advantages on multiple fronts: 1) a new efficient and lightweight anti-entropy mechanism; 2) greatly reduced per-key causality metadata size; 3) accurate key deletes without permanent metadata.

Almeida PS, Moreno CB, Gonçalves R, Preguiça N, Fonte V.  2014.  Scalable and Accurate Causality Tracking for Eventually Consistent Stores. Distributed Applications and Interoperable Systems. 8460 Abstractdvvset-dais.pdf

In cloud computing environments, data storage systems often rely on optimistic replication to provide good performance and availability even in the presence of failures or network partitions. In this scenario, it is important to be able to accurately and efficiently identify updates executed concurrently. Current approaches to causality tracking in optimistic replication have problems with concurrent updates: they either (1) do not scale, as they require replicas to maintain information that grows linearly with the number of writes or unique clients; (2) lose information about causality, either by removing entries from client-id based version vectors or using server-id based version vectors, which cause false conflicts. We propose a new logical clock mechanism and a logical clock framework that together support a traditional key-value store API, while capturing causality in an accurate and scalable way, avoiding false conflicts. It maintains concise information per data replica, only linear on the number of replica servers, and allows data replicas to be compared and merged linear with the number of replica servers and versions.

Moreno CB, Almeida PS, Fonte V, Preguiça N, Gonçalves R.  2012.  Brief announcement: efficient causality tracking in distributed storage systems with dotted version vectors. Proceedings of the symposium on Principles of distributed computing - PODC. :335–336. Abstractp335-preguica.pdf

Version vectors (VV) are used pervasively to track dependencies between replica versions in multi-version distributed storage systems. In these systems, VV tend to have a dual functionality: identify a version and encode causal dependencies. In this paper, we show that by maintaining the identifier of the version separate from the causal past, it is possible to verify causality in constant time (instead of O(n) for VV) and to precisely track causality with information with size bounded by the degree of replication, and not by the number of concurrent writers.

Gonçalves R, Almeida PS, Moreno CB, Fonte V, Preguiça N.  2011.  Evaluating Dotted Version Vectors in Riak. INForum - Simpósio de Informática. :474–479. Abstractmain.pdf

The NoSQL movement is rapidly increasing in importance, acceptance and usage in major (web) applications, that need the partition-tolerance and availability of the CAP theorem for scalability purposes, thus sacrificing the consistency side. With this approach, paradigms such as Eventual Consistency became more widespread. An eventual consistent system must handle data divergence and conflicts, that have to be carefully accounted for. Some systems have tried to use classic Version Vectors (VV) to track causality, but these reveal either scalability problems or loss of accuracy (when pruning is used to prevent vector growth).

Dotted Version Vectors (DVV) is a novel mechanism for dealing with data versioning in eventual consistent systems, that allows both accurate causality tracking and scalability both in the number of clients and servers, while limiting vector size to replication degree.

In this paper we describe briefly the challenges faced when incorporating DVV in Riak (a distributed key-value store), evaluate its behavior and performance, and discuss the advantages and disadvantages of this specific implementation.