Publications

Almeida PS, Moreno CB, Fonte V.  2007.  Improving on version stamps. On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops. 4806:1025–1031. Abstract

Optimistic distributed systems often rely on version vectors or their variants in order to track updates on replicated objects. Some of these mechanisms rely on some form of global configuration or distributed naming protocol in order to assign unique identifiers to each replica. These approaches are incompatible with replica creation under arbitrary partitions, a typical operation mode in mobile or poorly connected environments. Other mechanisms assign unique identifiers relying on statistical correctness. In previous work we have introduced an update tracking mechanism that overcomes these limitations. This paper presents results from recent experimentation, that brought to surface a particular pattern of operation that results in an unforeseen, unlimited growth in space consumption. We also describe informally a new update tracking mechanism that does not exhibit this pathological growth while providing guaranteed unique identifiers for a dynamic number of replicas under arbitrary partitions and the same functionality of version vectors.

Noble J, Vitek J, Lea D, Almeida PS.  1999.  Aliasing in object oriented systems. Object-Oriented Technology ECOOP’99 Workshop Reader. :793–793. Abstract

This chapter contains summaries of the presentations given at the Intercontinental Workshop on Aliasing in Object-Oriented Systems (IWAOOS’99) at the European Conference on Object-Oriented Programming (ECOOP’99) which was held in Lisbon, Portugal on June 15, 1999.

Almeida PS.  1997.  Balloon types: Controlling sharing of state in data types. ECOOP'97—Object-Oriented Programming. :32–59. Abstract

Current data abstraction mechanisms are not adequate to control sharing of state in the general case involving objects in linked structures. The pervading possibility of sharing is a source of errors and an obstacle to language implementation techniques.
We present a general extension to programming languages which makes the ability to share state a first class property of a data type, resolving a long-standing flaw in existing data abstraction mechanisms.
Balloon types enforce a strong form of encapsulation: no state reachable (directly or transitively) by a balloon object is referenced by any external object. Syntactic simplicity is achieved by relying on a non-trivial static analysis as the checking mechanism.
Balloon types are applicable in a wide range of areas such as program transformation, memory management and distributed systems. They are the key to obtaining self-contained composite objects, truly opaque data abstractions and value types—important concepts for the development of large scale, provably correct programs.

Enes V, Moreno CB, Almeida PS, Leitão J.  2017.  Borrowing an Identity for a Distributed Counter. PaPoC '17 Proceedings of the 3rd Workshop on the Principles and Practice of Consistency for Distributed Data. a5-enes.pdf
Younes G, Almeida PS, Moreno CB.  2017.  Compact Resettable Counters through Causal Stability. PaPoC '17 Proceedings of the 3rd Workshop on the Principles and Practice of Consistency for Distributed Data. a3-younes.pdf
Gonçalves R, Almeida PS, Moreno CB, Fonte V.  2017.  DottedDB: Anti-Entropy without Merkle Trees, Deletes without Tombstones. 36th IEEE International Symposium on Reliable Distributed Systems . Abstractdotteddb_srds.pdf

To achieve high availability in the face of network partitions, many distributed databases adopt eventual consistency, allow temporary conflicts due to concurrent writes, and use some form of per-key logical clock to detect and resolve such conflicts. Furthermore, nodes synchronize periodically to ensure replica convergence in a process called anti-entropy, normally using Merkle Trees. We present the design of DottedDB, a Dynamo-like key-value store, which uses a novel node-wide logical clock framework, overcoming three fundamental limitations of the state of the art: (1) minimize the metadata per key necessary to track causality, avoiding its growth even in the face of node churn; (2) correctly and durably delete keys, with no need for tombstones; (3) offer a lightweight anti-entropy mechanism to converge replicated data, avoiding the need for Merkle Trees. We evaluate DottedDB against MerkleDB, an otherwise identical database, but using per-key logical clocks and Merkle Trees for anti-entropy, to precisely measure the impact of the novel approach. Results show that: causality metadata per object always converges rapidly to only one id-counter pair; distributed deletes are correctly achieved without global coordination and with constant metadata; divergent nodes are synchronized faster, with less memory-footprint and with less communication overhead than using Merkle Trees.

Kassam Z, Shoker A, Almeida PS, Moreno CB.  2017.  Aggregation Protocols in Light of Reliable Communication. The 16th IEEE International Symposium on Network Computing and Applications (NCA 2017). Aggregation Protocols in Light of Reliable Communication
Moreno CB, Almeida PS, Lerche C.  2016.  The problem with embedded CRDT counters and a solution. PaPoC '16 Proceedings of the 2nd Workshop on the Principles and Practice of Consistency for Distributed Data. abstractcounterpapocfinal.pdf
Almeida PS, Shoker A, Moreno CB.  2015.  Efficient State-based CRDTs by Delta-Mutation. In the Proceedings of the International Conference on NETworked sYStems}. 9466 Abstractdeltacrdt.pdf

CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs ensure convergence through disseminating the entire state, that may be large, and merging it to other replicas; whereas operation-based CRDTs disseminate operations (i.e., small states) assuming an exactly-once reliable dissemination layer. We introduce Delta State Conflict-Free Replicated Datatypes (δ-CRDT) that can achieve the best of both worlds: small messages with an incremental nature, as in operation-based CRDTs, disseminated over unreliable communication channels, as in traditional state-based CRDTs. This is achieved by defining δ-mutators to return a delta-state, typically with a much smaller size than the full state, that is joined to both: local and remote states. We introduce the δ-CRDT framework, and we explain it through establishing a correspondence to current state-based CRDTs. In addition, we present an anti-entropy algorithm that ensures causal consistency, and we introduce two δ-CRDT specifications of well-known replicated datatypes.

Almeida PS, Moreno CB, Fonte V, Gonçalves R.  2015.  Concise Server-Wide Causality Management for Eventually Consistent Data Stores. DAIS - IFIP International Conference on Distributed Applications and Interoperable Systems. Abstractglobal_logical_clocks.pdf

Large scale distributed data stores rely on optimistic replication to scale and remain highly available in the face of network partitions. Managing data without coordination results in eventually consistent data stores that allow for concurrent data updates. These systems often use anti-entropy mechanisms (like Merkle Trees) to detect and repair divergent data versions across nodes. However, in practice hash-based data structures are too expensive for large amounts of data and create too many false conflicts. Another aspect of eventual consistency is detecting write conflicts. Logical clocks are often used to track data causality, necessary to detect causally concurrent writes on the same key. However, there is a nonnegligible metadata overhead per key, which also keeps growing with time, proportional with the node churn rate. Another challenge is deleting keys while respecting causality: while the values can be deleted, perkey metadata cannot be permanently removed without coordination. We introduce a new causality management framework for eventually consistent data stores, that leverages node logical clocks (Bitmapped Version Vectors) and a new key logical clock (Dotted Causal Container) to provides advantages on multiple fronts: 1) a new efficient and lightweight anti-entropy mechanism; 2) greatly reduced per-key causality metadata size; 3) accurate key deletes without permanent metadata.

Almeida PS, Moreno CB, Gonçalves R, Preguiça N, Fonte V.  2014.  Scalable and Accurate Causality Tracking for Eventually Consistent Stores. Distributed Applications and Interoperable Systems. 8460 Abstractdvvset-dais.pdf

In cloud computing environments, data storage systems often rely on optimistic replication to provide good performance and availability even in the presence of failures or network partitions. In this scenario, it is important to be able to accurately and efficiently identify updates executed concurrently. Current approaches to causality tracking in optimistic replication have problems with concurrent updates: they either (1) do not scale, as they require replicas to maintain information that grows linearly with the number of writes or unique clients; (2) lose information about causality, either by removing entries from client-id based version vectors or using server-id based version vectors, which cause false conflicts. We propose a new logical clock mechanism and a logical clock framework that together support a traditional key-value store API, while capturing causality in an accurate and scalable way, avoiding false conflicts. It maintains concise information per data replica, only linear on the number of replica servers, and allows data replicas to be compared and merged linear with the number of replica servers and versions.

Moreno CB, Almeida PS, Shoker A.  2014.  Making Operation-Based CRDTs Operation-Based. The International Conference on Distributed Applications and Interoperable Systems - DAIS. 8460:126–140. AbstractBest paper (joint)

Conflict-free Replicated Datatypes (CRDT) can simplify the design of eventually consistent systems. They can be classified into state- based or operation-based. Operation-based designs have the potential for allowing very compact solutions in both the sent messages and the object state size. Unfortunately, the current approaches are still far from this objective. In this paper, we introduce a new ‘pure’ operation-based frame- work that makes the design and the implementation of these CRDTs more simple and efficient. We show how to leverage the meta-data of the messaging middleware to design very compact CRDTs, while only disseminating operation names and their optional arguments.

Moreno CB, Almeida PS, Borges M, Jesus P.  2012.  Spectra: Robust estimation of distribution functions in networks. Distributed Applications and Interoperable Systems. 7272:96–103. Abstract1204.1373.pdf

The distributed aggregation of simple aggregates such as minima/maxima, counts, sums and averages have been studied in the past and are important tools for distributed algorithms and network co- ordination. Nonetheless, this kind of aggregates may not be comprehen- sive enough to characterize biased data distributions or when in presence of outliers, making the case for richer estimates.
This work presents Spectra, a distributed algorithm for the estimation of distribution functions over large scale networks. The estimate is available at all nodes and the technique depicts important properties: robustness when exposed to high levels of message loss, fast convergence speed and fine precision in the estimate. It can also dynamically cope with changes of the sampled local property and with churn, without requiring restarts. The proposed approach is experimentally evaluated and contrasted to a competing state of the art distribution aggregation technique.

Moreno CB, Almeida PS, Fonte V, Preguiça N, Gonçalves R.  2012.  Brief announcement: efficient causality tracking in distributed storage systems with dotted version vectors. Proceedings of the symposium on Principles of distributed computing - PODC. :335–336. Abstractp335-preguica.pdf

Version vectors (VV) are used pervasively to track dependencies between replica versions in multi-version distributed storage systems. In these systems, VV tend to have a dual functionality: identify a version and encode causal dependencies. In this paper, we show that by maintaining the identifier of the version separate from the causal past, it is possible to verify causality in constant time (instead of O(n) for VV) and to precisely track causality with information with size bounded by the degree of replication, and not by the number of concurrent writers.

Almeida PS, Moreno CB, Cunha A.  2012.  A Fast Distributed Computation of Distances in Networks. IEEE 51st Annual Conference on Decision and Control - CDC. :5215–5220. Abstractcdc12-1.pdf

This paper presents a distributed algorithm to simultaneously compute the diameter, radius and node eccentricity in all nodes of a synchronous network. Such topological information may be useful as input to configure other algorithms. Previous approaches have been modular, progressing in sequential phases using building blocks such as BFS tree construction, thus incurring longer executions than strictly required. We present an algorithm that, by timely propagation of available estimations, achieves a faster convergence to the correct values. We show local criteria for detecting convergence in each node. The algorithm avoids the creation of BFS trees and simply manipulates sets of node ids and hop counts. For the worst scenario of variable start times, each node i with eccentricity ecc(i) can compute: the node eccentricity in diam(G)+ecc(i)+2 rounds; the diameter in 2 diam(G)+ecc(i)+2 rounds; and the radius in diam(G) + ecc(i) + 2 radius(G) rounds.