Reports

Almeida PS, Shoker A, Moreno CB.  2015.  Efficient State-based CRDTs by Delta-Mutation. CoRR. abs/1410.2803:1-19. Abstractdelta-tr.pdf

CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs achieve this by sharing local state changes through shipping the entire state, that is then merged to other replicas with an idempotent, associative, and commutative join operation, ensuring convergence. This imposes a large communication overhead as the state size becomes larger. We introduce Delta State Conflict-Free Replicated Datatypes ({\delta}-CRDT), which make use of {\delta}-mutators, defined in such a way to return a delta-state, typically, with a much smaller size than the full state. Delta-states are joined to the local state as well as to the remote states (after being shipped). This can achieve the best of both worlds: small messages with an incremental nature, as in operation-based CRDTs, disseminated over unreliable communication channels, as in traditional state-based CRDTs. We introduce the {\delta}-CRDT framework, and we explain it through establishing a correspondence to current state- based CRDTs. In addition, we present two anti-entropy algorithms: a basic one that provides eventual convergence, and another one that ensures both convergence and causal consistency. We also introduce two {\delta}-CRDT specifications of well-known replicated datatypes.

Moreno CB, Almeida PS, Cunha A, Ferreira C.  2015.  Composition of State-based CRDTs. Abstractcrdtcompositionreport.pdf

State-based CRDTs are rooted in mathematical structures called join-semilattices (more simply lattices, in this context). These order structures ensure that the replicated states of the defined data types evolve and increase in a partial order in a sufficiently defined way, so as to ensure that all concurrent evolutions can be merged deterministically. In order to build, or understand the building principles, of state-based CRDTs it is necessary to understand the basic building blocks of the support lattices and how lattices can be composed.

Jin D, Yang B, Moreno CB, Liu D, He D, Liu J.  2013.  Markov random walk under constraint for discovering overlapping communities in complex networks. :21. Abstract1303.5675.pdf

The detection of overlapping communities in complex networks has motivated recent research in relevant fields. Aiming to address this problem, we propose a Markov-dynamics-based algorithm, called UEOC, which means 'unfold and extract overlapping communities'. In UEOC, when identifying each natural community that overlaps, a Markov random walk method combined with a constraint strategy, which is based on the corresponding annealed network (degree conserving random network), is performed to unfold the community. Then, a cutoff criterion with the aid of a local community function, called conductance, which can be thought of as the ratio between the number of edges inside the community and those leaving it, is presented to extract this emerged community from the entire network. The UEOC algorithm depends on only one parameter whose value can be easily set, and it requires no prior knowledge of the hidden community structures. The proposed UEOC has been evaluated both on synthetic benchmarks and on some real-world networks, and has been compared with a set of competing algorithms. The experimental result has shown that UEOC is highly effective and efficient for discovering overlapping communities.

Zawirsky M, Bieniusa A, Balegas V, Duarte S, Moreno CB, Shapiro M, Preguiça N.  2013.  SwiftCloud: Fault-tolerant geo-replication integrated all the way to the client machine. :24. Abstract1310.3107v1.pdf

Client-side logic and storage are increasingly used in web and mobile applications to improve response time and availability. Current approaches tend to be ad-hoc and poorly integrated with the server-side logic. We present a principled approach to integrate client- and server-side storage. We support mergeable and strongly consistent transactions that target either client or server replicas and provide access to causally-consistent snapshots efficiently. In the presence of infrastructure faults, a client-assisted failover solution allows client execution to resume immediately and seamlessly access consistent snapshots without waiting. We implement this approach in SwiftCloud, the first transactional system to bring geo-replication all the way to the client machine. Example applications show that our programming model is useful across a range of application areas. Our experimental evaluation shows that SwiftCloud provides better fault tolerance and at the same time can improve both latency and throughput by up to an order of magnitude, compared to classical geo-replication techniques.

Almeida PS, Moreno CB.  2013.  Scalable eventually consistent counters over unreliable networks. :1-32. Abstract1307.3207v1_1.pdf

Counters are an important abstraction in distributed computing, and play a central role in large scale geo-replicated systems, counting events such as web page impressions or social network "likes". Classic distributed counters, strongly consistent, cannot be made both available and partition-tolerant, due to the CAP Theorem, being unsuitable to large scale scenarios. This paper defines Eventually Consistent Distributed Counters (ECDC) and presents an implementation of the concept, Handoff Counters, that is scalable and works over unreliable networks. By giving up the sequencer aspect of classic distributed counters, ECDC implementations can be made AP in the CAP design space, while retaining the essence of counting. Handoff Counters are the first CRDT (Conflict-free Replicated Data Type) based mechanism that overcomes the identity explosion problem in naive CRDTs, such as G-Counters (where state size is linear in the number of independent actors that ever incremented the counter), by managing identities towards avoiding global propagation and garbage collecting temporary entries. The approach used in Handoff Counters is not restricted to counters, being more generally applicable to other data types with associative and commutative operations.

Bieniusa A, Zawirsky M, Preguiça N, Shapiro M, Moreno CB, Balegas V, Duarte S.  2012.  An optimized conflict-free replicated set. CoRR. 1210.3368(8083):9. Abstract1210.3368v1.pdf

Eventual consistency of replicated data supports concurrent updates, reduces latency and improves fault tolerance, but forgoes strong consistency. Accordingly, several cloud computing platforms implement eventually-consistent data types. The set is a widespread and useful abstraction, and many replicated set designs have been proposed. We present a reasoning abstraction, permutation equivalence, that systematizes the characterization of the expected concurrency semantics of concurrent types. Under this framework we present one of the existing conflict-free replicated data types, Observed-Remove Set. Furthermore, in order to decrease the size of meta-data, we propose a new optimization to avoid tombstones. This approach that can be transposed to other data types, such as maps, graphs or sequences.

Shapiro M, Preguiça N, Moreno CB, Zawirsky M, others.  2011.  A comprehensive study of convergent and commutative replicated data types. :47. Abstracttechreport.pdf

Eventual consistency aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufficient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs). This paper formalises asynchronous object replication, either state based or operation based, and provides a sufficient condition appropriate for each case. It describes several useful CRDTs, including container data types supporting both add and remove operations with clean semantics, and more complex types such as graphs, montonic DAGs,
and sequences. It discusses some properties needed to implement non-trivial CRDTs.

Shapiro M, Preguiça N, Moreno CB, Zawirsky M.  2011.  Conflict-free replicated data types. Stabilization, Safety, and Security of Distributed Systems. :386–400. Abstractrr-7687.pdf

Replicating data under Eventual Consistency (EC) allows any replica to accept updates without remote synchronisation. This ensures performance and scalability in largescale distributed systems (e.g., clouds). However, published EC approaches are ad-hoc and error-prone. Under a formal Strong Eventual Consistency (SEC) model, we study sufficient conditions for convergence. A data type that satisfies these conditions is called a Conflictfree Replicated Data Type (CRDT). Replicas of any CRDT are guaranteed to converge in a self-stabilising manner, despite any number of failures. This paper formalises two popular approaches (state- and operation-based) and their relevant sufficient conditions. We study a number of useful CRDTs, such as sets with clean semantics, supporting both add and remove operations, and consider in depth the more complex Graph data type. CRDT types can be composed to develop large-scale distributed applications, and have interesting theoretical properties.

Almeida PS, Moreno CB, Farach-Colton M, Jesus P, Mosteiro M.  2011.  Fault-Tolerant Aggregation: Flow-Updating Meets Mass-Distribution. :513-527. Abstractmdfu.pdf

Flow-Updating (FU) is a fault-tolerant technique that has proved to be efficient in practice for the distributed computation of aggregate functions in communication networks where individual processors do not have access to global information. Previous distributed aggregation protocols, based on repeated sharing of input values (or mass) among processors, sometimes called Mass-Distribution (MD) protocols, are not resilient to communication failures (or message loss) because such failures yield a loss of mass. In this paper, we present a protocol which we call Mass-Distribution with Flow-Updating (MDFU). We obtain MDFU by applying FU techniques to classic MD. We analyze the convergence time of MDFU showing that stochastic message loss produces low overhead. This is the first convergence proof of an FU-based algorithm. We evaluate MDFU experimentally, comparing it with previous MD and FU protocols, and verifying the behavior predicted by the analysis. Finally, given that MDFU incurs a fixed deviation proportional to the message-loss rate, we adjust the accuracy of MDFU heuristically in a new protocol called MDFU with Linear Prediction (MDFU-LP). The evaluation shows that both MDFU and MDFU-LP behave very well in practice, even under high rates of message loss and even changing the input values dynamically.

Jesus P, Moreno CB, Almeida PS.  2011.  A Survey of Distributed Data Aggregation Algorithms. Arxiv preprint arXiv:1110.0725. :45. Abstract1110.0725.pdf

Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting values result from the distributed computation of functions like COUNT, SUM and AVERAGE. Some application examples can found to determine the network size, total storage capacity, average load, majorities and many others.
In the last decade, many different approaches have been proposed, with different trade-offs in terms of accuracy, reliability, message and time complexity. Due to the considerable amount and variety of aggregation algorithms, it can be difficult and time consuming to determine which techniques will be more appropriate to use in specific settings, justifying the existence of a survey to aid in this task.
This work reviews the state of the art on distributed data aggregation algorithms, providing three main contributions. First, it formally defines the concept of aggregation, characterizing the different types of aggregation functions. Second, it succinctly describes the main aggregation techniques, organizing them in a taxonomy. Finally, it provides some guidelines toward the selection and use of the most relevant techniques, summarizing their principal characteristics.

Preguiça N, Moreno CB, Almeida PS, Fonte V, Gonçalves R.  2010.  Dotted Version Vectors: Logical Clocks for Optimistic Replication. CoRR. 1011.5808 Abstractdvv-arxiv.pdf

In cloud computing environments, a large number of users access data stored in highly available storage systems. To provide good performance to geographically disperse users and allow operation even in the presence of failures or network partitions, these systems often rely on optimistic replication solutions that guarantee only eventual consistency. In this scenario, it is important to be able to accurately and efficiently identify updates executed concurrently. In this paper, first we review, and expose problems with current approaches to causality tracking in optimistic replication: these either lose information about causality or do not scale, as they require replicas to maintain information that grows linearly with the number of clients or updates.Then, we propose a novel solution that fully captures causality while being very concise in that it maintains information that grows linearly only with the number of servers that register updates for a given data element, bounded by the degree of replication.

Moreno CB, Lopes N.  2004.  B Trees on P2P: Providing content indexing over DHT overlays. :1-5. Abstractbtp2p-techrep_2.pdf

The ability to search by content has been at the core of P2P data sharing systems and is a fundamental tool in the modern Web. However, currently deployed P2P search technology still suffers from either excessive centralization, abuse of network resources or low accuracy.
Efcient overlay structuring systems, like distributed hash tables (DHTs), provide adequate solutions to content location as long as unique identiers are used. They cannot, however, directly support search without negative impacts on the load balance of data distribution among peer nodes. We will show that DHTs can be used as a base for efcient content indexing by building a BTree structure that coordinates the use of homogeneous size blocks,
compatible with the DHT load balance assumptions. The remaining of the paper is dedicated
to a discussion of some of the issues, problems and possible solutions, that need to considered when building complex data structures on top of a peer-to-peer DHT layer.

Moreno CB, Moura F.  1997.  Specification of convergent abstract data types for autonomous mobile computing. Distributed Systems Group, Minho University. :18. Abstractscadt4.pdf

Abstract Traditional replica control mechanisms such as quorum consensus, primary replicas and other strong consistency approaches are unable to provide a useful level of availability on unconstrained mobile environments. We define an environment thats exploits pair-wise communication and allows autonomous creation and joining of replicas while ensuring eventual convergence. A set of composable components (ADTs) are formally specified using the SETS Calculus. These components can be used to build simple distributed applications that take advantage of peer-to-peer communication between mobile hosts.