Publications

Bieniusa A, Zawirsky M, Preguiça N, Shapiro M, Moreno CB, Balegas V, Duarte S.  2012.  An optimized conflict-free replicated set. CoRR. 1210.3368(8083):9. Abstract1210.3368v1.pdf

Eventual consistency of replicated data supports concurrent updates, reduces latency and improves fault tolerance, but forgoes strong consistency. Accordingly, several cloud computing platforms implement eventually-consistent data types. The set is a widespread and useful abstraction, and many replicated set designs have been proposed. We present a reasoning abstraction, permutation equivalence, that systematizes the characterization of the expected concurrency semantics of concurrent types. Under this framework we present one of the existing conflict-free replicated data types, Observed-Remove Set. Furthermore, in order to decrease the size of meta-data, we propose a new optimization to avoid tombstones. This approach that can be transposed to other data types, such as maps, graphs or sequences.

Shapiro M, Preguiça N, Moreno CB, Zawirsky M, others.  2011.  A comprehensive study of convergent and commutative replicated data types. :47. Abstracttechreport.pdf

Eventual consistency aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufficient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs). This paper formalises asynchronous object replication, either state based or operation based, and provides a sufficient condition appropriate for each case. It describes several useful CRDTs, including container data types supporting both add and remove operations with clean semantics, and more complex types such as graphs, montonic DAGs,
and sequences. It discusses some properties needed to implement non-trivial CRDTs.

Shapiro M, Preguiça N, Moreno CB, Zawirsky M.  2011.  Conflict-free replicated data types. Stabilization, Safety, and Security of Distributed Systems. :386–400. Abstractrr-7687.pdf

Replicating data under Eventual Consistency (EC) allows any replica to accept updates without remote synchronisation. This ensures performance and scalability in largescale distributed systems (e.g., clouds). However, published EC approaches are ad-hoc and error-prone. Under a formal Strong Eventual Consistency (SEC) model, we study sufficient conditions for convergence. A data type that satisfies these conditions is called a Conflictfree Replicated Data Type (CRDT). Replicas of any CRDT are guaranteed to converge in a self-stabilising manner, despite any number of failures. This paper formalises two popular approaches (state- and operation-based) and their relevant sufficient conditions. We study a number of useful CRDTs, such as sets with clean semantics, supporting both add and remove operations, and consider in depth the more complex Graph data type. CRDT types can be composed to develop large-scale distributed applications, and have interesting theoretical properties.

Almeida PS, Moreno CB, Farach-Colton M, Jesus P, Mosteiro M.  2011.  Fault-Tolerant Aggregation: Flow-Updating Meets Mass-Distribution. :513-527. Abstractmdfu.pdf

Flow-Updating (FU) is a fault-tolerant technique that has proved to be efficient in practice for the distributed computation of aggregate functions in communication networks where individual processors do not have access to global information. Previous distributed aggregation protocols, based on repeated sharing of input values (or mass) among processors, sometimes called Mass-Distribution (MD) protocols, are not resilient to communication failures (or message loss) because such failures yield a loss of mass. In this paper, we present a protocol which we call Mass-Distribution with Flow-Updating (MDFU). We obtain MDFU by applying FU techniques to classic MD. We analyze the convergence time of MDFU showing that stochastic message loss produces low overhead. This is the first convergence proof of an FU-based algorithm. We evaluate MDFU experimentally, comparing it with previous MD and FU protocols, and verifying the behavior predicted by the analysis. Finally, given that MDFU incurs a fixed deviation proportional to the message-loss rate, we adjust the accuracy of MDFU heuristically in a new protocol called MDFU with Linear Prediction (MDFU-LP). The evaluation shows that both MDFU and MDFU-LP behave very well in practice, even under high rates of message loss and even changing the input values dynamically.

Jesus P, Moreno CB, Almeida PS.  2011.  A Survey of Distributed Data Aggregation Algorithms. Arxiv preprint arXiv:1110.0725. :45. Abstract1110.0725.pdf

Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting values result from the distributed computation of functions like COUNT, SUM and AVERAGE. Some application examples can found to determine the network size, total storage capacity, average load, majorities and many others.
In the last decade, many different approaches have been proposed, with different trade-offs in terms of accuracy, reliability, message and time complexity. Due to the considerable amount and variety of aggregation algorithms, it can be difficult and time consuming to determine which techniques will be more appropriate to use in specific settings, justifying the existence of a survey to aid in this task.
This work reviews the state of the art on distributed data aggregation algorithms, providing three main contributions. First, it formally defines the concept of aggregation, characterizing the different types of aggregation functions. Second, it succinctly describes the main aggregation techniques, organizing them in a taxonomy. Finally, it provides some guidelines toward the selection and use of the most relevant techniques, summarizing their principal characteristics.

Preguiça N, Moreno CB, Almeida PS, Fonte V, Gonçalves R.  2010.  Dotted Version Vectors: Logical Clocks for Optimistic Replication. CoRR. 1011.5808 Abstractdvv-arxiv.pdf

In cloud computing environments, a large number of users access data stored in highly available storage systems. To provide good performance to geographically disperse users and allow operation even in the presence of failures or network partitions, these systems often rely on optimistic replication solutions that guarantee only eventual consistency. In this scenario, it is important to be able to accurately and efficiently identify updates executed concurrently. In this paper, first we review, and expose problems with current approaches to causality tracking in optimistic replication: these either lose information about causality or do not scale, as they require replicas to maintain information that grows linearly with the number of clients or updates.Then, we propose a novel solution that fully captures causality while being very concise in that it maintains information that grows linearly only with the number of servers that register updates for a given data element, bounded by the degree of replication.

Moreno CB, Lopes N.  2004.  B Trees on P2P: Providing content indexing over DHT overlays. :1-5. Abstractbtp2p-techrep_2.pdf

The ability to search by content has been at the core of P2P data sharing systems and is a fundamental tool in the modern Web. However, currently deployed P2P search technology still suffers from either excessive centralization, abuse of network resources or low accuracy.
Efcient overlay structuring systems, like distributed hash tables (DHTs), provide adequate solutions to content location as long as unique identiers are used. They cannot, however, directly support search without negative impacts on the load balance of data distribution among peer nodes. We will show that DHTs can be used as a base for efcient content indexing by building a BTree structure that coordinates the use of homogeneous size blocks,
compatible with the DHT load balance assumptions. The remaining of the paper is dedicated
to a discussion of some of the issues, problems and possible solutions, that need to considered when building complex data structures on top of a peer-to-peer DHT layer.

Moreno CB, Moura F.  1997.  Specification of convergent abstract data types for autonomous mobile computing. Distributed Systems Group, Minho University. :18. Abstractscadt4.pdf

Abstract Traditional replica control mechanisms such as quorum consensus, primary replicas and other strong consistency approaches are unable to provide a useful level of availability on unconstrained mobile environments. We define an environment thats exploits pair-wise communication and allows autonomous creation and joining of replicas while ensuring eventual convergence. A set of composable components (ADTs) are formally specified using the SETS Calculus. These components can be used to build simple distributed applications that take advantage of peer-to-peer communication between mobile hosts.

Jesus P.  2012.  Robust Distributed Data Aggregation. Abstractpaulo_cesar_de_oliveira_jesus.pdf

Distributed aggregation algorithms are an important building block of modern large scale systems, as it allows the determination of meaningful system-wide properties (e.g., network size, total storage capacity, average load, or majorities) which are required to direct the execution of distributed applications. In the last decade, several algorithms have been proposed to address the distributed computation of aggregation functions (e.g., COUNT, SUM, AVERAGE, and MAX/MIN), exhibiting different properties in terms of accuracy, speed and communication tradeoffs. However, existing approaches exhibit many issues when challenged in faulty and dynamic environments, lacking in terms of fault-tolerance and support to churn.
This study details a novel distributed aggregation approach, named Flow Updating, which is fault-tolerant and able to operate on dynamics networks. The algorithm is based on manipulating flows (inspired by the concept from graph theory), that are updated using idempotent messages, providing it with unique robustness capabilities. Experimental results showed that Flow Updating outperforms previous averaging algorithms in terms of time and message complexity, and unlike them it self adapts to churn and changes of the initial input values without requiring any periodic restart, supporting node crashes and high levels of message loss.
In addition to this main contribution, others can also be found in this research work, namely: a definition of the aggregation problem is proposed; existing distributed aggregation algorithm are surveyed and classified into a comprehensive taxonomy; a novel algorithm is introduced, based on Flow Updating, to estimate the Cumulative Distribution Function (CDF) of a global system attribute.
It is expected that this work will constitute a relevant contribution to the area of distributed computing, in particular to the robust distributed computation of aggregation functions in dynamic networks.