Journal Articles

Almeida PS, Shoker A, Moreno CB.  2018.  Delta state replicated data types. Journal of Parallel and Distributed Computing. 111:162-173. AbstractDeltaCRDTJournalWebsite


Almeida PS, Moreno CB, Farach-Colton M, Jesus P, Mosteiro MA.  2016.  Fault-tolerant aggregation: Flow-Updating meets Mass-Distribution. Distributed Computing. AbstractFull paper

Flow-Updating (FU) is a fault-tolerant technique that has proved to be efficient in practice for the distributed computation of aggregate functions in communication networks where individual processors do not have access to global information. Previous distributed aggregation protocols, based on repeated sharing of input values (or mass) among processors, sometimes called Mass-Distribution (MD) protocols, are not resilient to communication failures (or message loss) because such failures yield a loss of mass. In this paper, we present a protocol which we call Mass-Distribution with Flow-Updating (MDFU). We obtain MDFU by applying FU techniques to classic MD. We analyze the convergence time of MDFU showing that stochastic message loss produces low overhead. This is the first convergence proof of an FU-based algorithm. We evaluate MDFU experimentally, comparing it with previous MD and FU protocols, and verifying the behavior predicted by the analysis. Finally, given that MDFU incurs a fixed deviation proportional to the message-loss rate, we adjust the accuracy of MDFU heuristically in a new protocol called MDFU with Linear Prediction (MDFU-LP). The evaluation shows that both MDFU and MDFU-LP behave very well in practice, even under high rates of message loss and even changing the input values dynamically.

Jesus P, Moreno CB, Almeida PS.  2015.  Flow updating: Fault-tolerant aggregation for dynamic networks. Journal of Parallel and Distributed Computing. 78:53-64. Abstractjpdcfu.pdf

Data aggregation is a fundamental building block of modern distributed systems. Averaging based pproaches, commonly designated gossip-based, are an important class of aggregation algorithms as they allow all nodes to produce a result, converge to any required accuracy, and work independently from the network topology. However, existing approaches exhibit many dependability issues when used in faulty and dynamic environments. This paper describes and evaluates a fault tolerant distributed aggregation technique, Flow Updating, which overcomes the problems in previous averaging approaches and is able to operate on faulty dynamic networks. Experimental results show that this novel approach outperforms previous averaging algorithms; it self-adapts to churn and input value changes without requiring any periodic restart, supporting node crashes and high levels of message loss, and works in asynchronous networks. Realistic concerns have been taken into account in evaluating Flow Updating, like the use of unreliable failure detectors and asynchrony, targeting its application to realistic environments.

Moreno CB, Jesus P, Almeida PS.  2015.  A Survey of Distributed Data Aggregation Algorithms. IEEE Communications Surveys and Tutorials. 17(1):381-404. Abstract1110.0725.pdf

Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, which can then be used to direct the execution of other applications. The resulting values are derived by the distributed computation of functions like Count, Sum, and Average. Some application examples deal with the determination of the network size, total storage capacity, average load, majorities and many others. In the last decade, many different approaches have been proposed, with different trade-offs in terms of accuracy, reliability, message and time complexity. Due to the considerable amount and variety of aggregation algorithms, it can be difficult and time consuming to determine which techniques will be more appropriate to use in specific settings, justifying the existence of a survey to aid in this task. This work reviews the state of the art on distributed data aggregation algorithms, providing three main contributions. First, it formally defines the concept of aggregation, characterizing the different types of aggregation functions. Second, it succinctly describes the main aggregation techniques, organizing them in a taxonomy. Finally, it provides some guidelines toward the selection and use of the most relevant techniques, summarizing their principal characteristics.

Moreno CB, Almeida PS, Menezes R, Jesus P.  2012.  Extrema propagation: Fast Distributed Estimation of Sums and Network Sizes. IEEE Transactions on Parallel and Distributed Systems. 23(4):668–675. Abstract10.

Aggregation of data values plays an important role on distributed computations, in particular, over peer-to-peer and sensor networks, as it can provide a summary of some global system property and direct the actions of self-adaptive distributed algorithms. Examples include using estimates of the network size to dimension distributed hash tables or estimates of the average system load to direct load balancing. Distributed aggregation using nonidempotent functions, like sums, is not trivial as it is not easy to prevent a given value from being accounted for multiple times; this is especially the case if no centralized algorithms or global identifiers can be used. This paper introduces Extrema Propagation, a probabilistic technique for distributed estimation of the sum of positive real numbers. The technique relies on the exchange of duplicate insensitive messages and can be applied in flood and/or epidemic settings, where multipath routing occurs; it is tolerant of message loss; it is fast, as the number of message exchange steps can be made just slightly above the theoretical minimum; and it is fully distributed, with no single point of failure and the result produced at every node.

Almeida PS, Barbosa MB, Pinto JS, Vieira B.  2010.  Deductive verification of cryptographic software. Innovations in Systems and Software Engineering. 6:203–218. Abstractisse_2010.pdf

We apply state-of-the art deductive verification tools to check security-relevant properties of cryptographic software, including safety, absence of error propagation, and correctness with respect to reference implementations. We also develop techniques to help us in our task, focusing on methods oriented towards increased levels of automation, in scenarios where there are clear obvious limits to such automation. These techniques allow us to integrate automatic proof tools with an interactive proof assistant, where the latter is used off-line to prove once-and-for-all fundamental lemmas about properties of programs. The techniques developed have independent interest for practical deductive verification in general.

Almeida PS, Moreno CB, Preguiça N, Hutchison D.  2007.  Scalable bloom filters. Information Processing Letters. 101:255–261. Abstractdbloom.pdf

Bloom Filters provide space-efficient storage of sets at the cost of a probability of false positives on membership queries. The size of the filter must be defined a priori based on the number of elements to store and the desired false positive probability, being impossible to store extra elements without increasing the false positive probability. This leads typically to a conservative assumption regarding maximum set size, possibly by orders of magnitude, and a consequent space waste. This paper proposes Scalable Bloom Filters, a variant of Bloom Filters that can adapt dynamically to the number of elements stored, while assuring a maximum false positive probability.

Almeida PS, Almeida PS, Moreno CB.  2004.  Bounded version vectors. Distributed Computing - Lecture Notes in Computer Science. 3274:102–116. Abstractdisc04.pdf

Version vectors play a central role in update tracking under optimistic distributed systems, allowing the detection of obsolete or inconsistent versions of replicated data. Version vectors do not have a bounded representation; they are based on integer counters that grow indefinitely as updates occur. Existing approaches to this problem are scarce; the mechanisms proposed are either unbounded or operate only under specific settings. This paper examines version vectors as a mechanism for data causality tracking and clarifies their role with respect to vector clocks. Then, it introduces bounded stamps and proves them to be a correct alternative to integer counters in version vectors. The resulting mechanism, bounded version vectors, represents the first bounded solution to data causality tracking between replicas subject to local updates and pairwise symmetrical synchronization.

Almeida PS.  1999.  Type-checking balloon types. Electronic Notes in Theoretical Computer Science. 20:1–27. Abstracttype-checking_balloon_types_1.pdf

Current data abstraction mechanisms are not adequate to control sharing of state in the general case involving objects in linked structures. The pervading possibility of sharing is a source of errors and an obstacle to language implementation techniques. Balloon types, which we have introduced in [2], are a general extension to programming languages. They make the ability to share state a rst class property of a data type. The balloon invariant expresses a strong form of encapsulation: no state reachable (directly or transitively) by a balloon object is referenced by any external object. In this paper we describe the checking mechanism for balloon types. It relies on a non-trivial static analysis, described as an abstract interpretation. Here we focus in particular on the design of the abstract domain which allows the checking mechanism to work under realistic assumptions regarding possible object aliasing.