Publications

Moreno CB, Almeida PS, Shoker A.  2014.  Making Operation-Based CRDTs Operation-Based. The International Conference on Distributed Applications and Interoperable Systems - DAIS. 8460:126–140. AbstractBest paper (joint)

Conflict-free Replicated Datatypes (CRDT) can simplify the design of eventually consistent systems. They can be classified into state- based or operation-based. Operation-based designs have the potential for allowing very compact solutions in both the sent messages and the object state size. Unfortunately, the current approaches are still far from this objective. In this paper, we introduce a new ‘pure’ operation-based frame- work that makes the design and the implementation of these CRDTs more simple and efficient. We show how to leverage the meta-data of the messaging middleware to design very compact CRDTs, while only disseminating operation names and their optional arguments.

Moreno CB, Gonçalves N, José R.  2014.  Collaborative and Privacy- Aware Sensing for Observing Urban Movement Patterns. 8th DPM International Workshop on Data Privacy Management. 8247 Abstract

The information infrastructure that pervades urban environments represents a major opportunity for collecting information about Human mobility that would be very important across many application domains. However, this huge potential has been undermined by the overwhelming privacy risks that are associated with such forms of large scale sensing. In this research, we are concerned with the problem of how to enable a set of autonomous sensing nodes, e.g. a Bluetooth scanner or a Wi-Fi hotspot, to collaborate in the observation of movement patterns of individuals without compromising their privacy. We describe a novel technique that generates Precedence Filters and allows probabilistic estimations of sequences of visits to monitored locations and we demonstrate how this technique can combine plausible deniability by an individual with valuable information about aggregate movement patterns. The results provide a promising step towards the application of new stochastic techniques in large scale sensing.

Lima R, Moreno CB, Miranda H.  2013.  Broadcast Cancellation in Search Mechanisms. Proceedings of the 28th Annual ACM Symposium on Applied Computing - SAC. Abstractsac2013_2.pdf

Searching for resources over unstructured networks is usually supported by broadcast communication primitives. Ideally, the broadcast process should be cancelled as soon as possible after a successful discovery, to avoid ooding the entire network. However, cancelling an ongoing broadcast is challenging and may increase the number of exchanged messages.
In this paper, we compare the cancellation mechanisms used by BERS and BERS? With new proposed cancellation approaches BCIR and BCIR? The formulation of a simpli ed analytical model and the simulation results show that:i)it is possible to reduce the number of retransmitted messages, without increasing the latency observed in BERS?; and ii) BCIR is more energy ecient, which can contribute to extend the availability of mobile battery powered devices.

Terelius H, Varagnolo D, Moreno CB, Johansson KH.  2013.  Fast distributed estimation of empirical mass functions over anonymous networks. IEEE 52nd Annual Conference on Decision and Control (CDC). Abstractcdc2013.pdf

The aggregation and estimation of values over networks is fundamental for distributed applications, such as wireless sensor networks. Estimating the average, minimal and maximal values has already been extensively studied in the literature. In this paper, we focus on estimating empirical distributions of values in a network with anonymous agents. In particular, we compare two different estimation strategies in terms of their convergence speed, accuracy and communication costs. The first strategy is deterministic and based on the average consensus protocol, while the second strategy is probabilistic and based on the max consensus protocol.

Moreno CB, Almeida PS, Borges M, Jesus P.  2012.  Spectra: Robust estimation of distribution functions in networks. Distributed Applications and Interoperable Systems. 7272:96–103. Abstract1204.1373.pdf

The distributed aggregation of simple aggregates such as minima/maxima, counts, sums and averages have been studied in the past and are important tools for distributed algorithms and network co- ordination. Nonetheless, this kind of aggregates may not be comprehen- sive enough to characterize biased data distributions or when in presence of outliers, making the case for richer estimates.
This work presents Spectra, a distributed algorithm for the estimation of distribution functions over large scale networks. The estimate is available at all nodes and the technique depicts important properties: robustness when exposed to high levels of message loss, fast convergence speed and fine precision in the estimate. It can also dynamically cope with changes of the sampled local property and with churn, without requiring restarts. The proposed approach is experimentally evaluated and contrasted to a competing state of the art distribution aggregation technique.

Lima R, Moreno CB, Miranda H.  2012.  Stopping on going broadcasts in large MANETs. Proceedings of the 1st European Workshop on AppRoaches to MObiquiTous Resilience - ARMOR. 4:4. Abstractc9.pdf

Broadcast is a communication primitive building block widely used in mobile ad-hoc networks (MANETs) for the exchange of control packets and resource location for upper level services such as routing and management protocols. Flooding is the most simple broadcast algorithm, but it wastes a lot of energy and bandwidth, as flooding leads to many redundant radio transmissions. An optimization to flooding is to contain it, once the resource has been found. In this paper, we compare the impact on the latency and power consumption of four competing approaches for flooding containment. The results show that stopping ongoing broadcasts can achieve promising performance increases over other flooding base techniques, when applied in large scale MANETs with scarce power resources. In addition, results show that both network topology and the number of copies of the resource influence differently the performance of each searching approach.

Moreno CB, Almeida PS, Fonte V, Preguiça N, Gonçalves R.  2012.  Brief announcement: efficient causality tracking in distributed storage systems with dotted version vectors. Proceedings of the symposium on Principles of distributed computing - PODC. :335–336. Abstractp335-preguica.pdf

Version vectors (VV) are used pervasively to track dependencies between replica versions in multi-version distributed storage systems. In these systems, VV tend to have a dual functionality: identify a version and encode causal dependencies. In this paper, we show that by maintaining the identifier of the version separate from the causal past, it is possible to verify causality in constant time (instead of O(n) for VV) and to precisely track causality with information with size bounded by the degree of replication, and not by the number of concurrent writers.

Moreno CB, Bieniusa A, Zawirsky M, Preguiça N, Shapiro M, Balegas V, Duarte S.  2012.  Brief announcement: Semantics of eventually consistent replicated sets. Proceedings of the 26th international conference on Distributed Computing - ICDCS . 7611:441–442. Abstractsemantics-set

This paper studies the semantics of sets under eventual consistency. The set is a pervasive data type, used either directly or as a component of more complex data types, such as maps or graphs. Eventual consistency of replicated data supports concurrent updates, reduces latency and improves fault tolerance, but forgoes strong consistency (e.g., linearisability). Accordingly, several cloud computing platforms implement eventually-consistent replicated sets [2,4].

Almeida PS, Moreno CB, Cunha A.  2012.  A Fast Distributed Computation of Distances in Networks. IEEE 51st Annual Conference on Decision and Control - CDC. :5215–5220. Abstractcdc12-1.pdf

This paper presents a distributed algorithm to simultaneously compute the diameter, radius and node eccentricity in all nodes of a synchronous network. Such topological information may be useful as input to configure other algorithms. Previous approaches have been modular, progressing in sequential phases using building blocks such as BFS tree construction, thus incurring longer executions than strictly required. We present an algorithm that, by timely propagation of available estimations, achieves a faster convergence to the correct values. We show local criteria for detecting convergence in each node. The algorithm avoids the creation of BFS trees and simply manipulates sets of node ids and hop counts. For the worst scenario of variable start times, each node i with eccentricity ecc(i) can compute: the node eccentricity in diam(G)+ecc(i)+2 rounds; the diameter in 2 diam(G)+ecc(i)+2 rounds; and the radius in diam(G) + ecc(i) + 2 radius(G) rounds.

Almeida PS, Moreno CB, Farach-Colton M, Jesus P, Mosteiro M.  2011.  Fault-Tolerant Aggregation: Flow-Updating Meets Mass-Distribution. Principles of Distributed Systems. 7109:513–527. Abstract1109.4373v1.pdf

Flow-Updating (FU) is a fault-tolerant technique that has proved to be efficient in practice for the distributed computation of aggregate functions in communication networks where individual processors do not have access to global information. Previous distributed aggregation protocols, based on repeated sharing of input values (or mass) among processors, sometimes called Mass-Distribution (MD) protocols, are not resilient to communication failures (or message loss) because such failures yield a loss of mass.

In this paper, we present a protocol which we call Mass-Distribution with Flow-Updating (MDFU). We obtain MDFU by applying FU techniques to classic MD. We analyze the convergence time of MDFU showing that stochastic message loss produces low overhead. This is the first convergence proof of an FU-based algorithm. We evaluate MDFU experimentally, comparing it with previous MD and FU protocols, and verifying the behavior predicted by the analysis. Finally, given that MDFU incurs a fixed deviation proportional to the message-loss rate, we adjust the accuracy of MDFU heuristically in a new protocol called MDFU with Linear Prediction (MDFU-LP). The evaluation shows that both MDFU and MDFU-LP behave very well in practice, even under high rates of message loss and even changing the input values dynamically.

Gonçalves R, Almeida PS, Moreno CB, Fonte V, Preguiça N.  2011.  Evaluating Dotted Version Vectors in Riak. INForum - Simpósio de Informática. :474–479. Abstractmain.pdf

The NoSQL movement is rapidly increasing in importance, acceptance and usage in major (web) applications, that need the partition-tolerance and availability of the CAP theorem for scalability purposes, thus sacrificing the consistency side. With this approach, paradigms such as Eventual Consistency became more widespread. An eventual consistent system must handle data divergence and conflicts, that have to be carefully accounted for. Some systems have tried to use classic Version Vectors (VV) to track causality, but these reveal either scalability problems or loss of accuracy (when pruning is used to prevent vector growth).

Dotted Version Vectors (DVV) is a novel mechanism for dealing with data versioning in eventual consistent systems, that allows both accurate causality tracking and scalability both in the number of clients and servers, while limiting vector size to replication degree.

In this paper we describe briefly the challenges faced when incorporating DVV in Riak (a distributed key-value store), evaluate its behavior and performance, and discuss the advantages and disadvantages of this specific implementation.

Borges M, Moreno CB, Jesus P, Almeida PS.  2011.  Estimativa Contínua e Tolerante a Faltas de Funções Distribuição Cumulativa em Redes de Larga Escala. INForum - Simpósio de Informática. Abstractsimposio_de_informatica_inforum_2011_borges.pdf

Em ambientes descentralizados de larga escala como é o caso das redes de sensores sem fios, P2P e outras, a recolha de dados é praticável apenas se houver agregação dos mesmos. No estado da arte actual, as estratégias de resolução desta questão não são satisfatórias dado exigirem que os protocolos sejam reiniciados sempre que os valores iniciais mudam ou quando o rácio de entrada/saída de nodos é não nulo. Acresce que nenhuma estratégia satisfaz a estimativa de CDFs (funções de dis- tribuição cumulativa). A abordagem apresentada neste trabalho mostra como é possível o cálculo de funções de distribuição cumulativa em redes distribuídas, permitindo o seguimento dinâmico dos valores amostrados, sem que seja necessário reiniciar o protocolo em condições adversas de funcionamento da rede (perda de mensagens, entrada/saída de nodos). Esta abordagem é baseada num protocolo de averaging tolerante a faltas. Com este trabalho pretende-se também contribuir com uma estratégia que reduz o custo de comunicação entre nodos, com base em decisões locais e sensível à taxa de variação de estimativas. Os resultados de simulação mostram a resiliência deste protocolo, permitindo a estimativa contínua de CDFs em presença de dinamismo (perda de mensagens, alteração de valor amostrado, churn). Mostram também a convergência rápida na estimativa de CDFs para diferentes configurações de rede.

Moreno CB, Jin D, He D, Liu D.  2010.  Genetic algorithm with local search for community mining in complex networks. 22nd International Conference on Tools with Artificial Intelligence - ICTAI. 1:105–112. Abstractgh36

Detecting communities from complex networks has triggered considerable attention in several application domains. Targeting this problem, a local search based genetic algorithm
(GALS) which employs a graph-based representation (LAR) has been proposed in this work. The core of the GALS is a local search based mutation technique. Aiming to overcome the drawbacks of the existing mutation methods, a concept called marginal gene has been proposed, and then an effective and efficient mutation method, combined with a local search strategy which is based on the concept of marginal gene, has also been proposed by analyzing the modularity function. Moreover, in this paper the percolation theory on ER random graphs is employed to further clarify the effectiveness of LAR presentation; A Markov random walk based method is adopted to produce an accurate and diverse initial population; the solution space of GALS will be significantly reduced by using a graph based mechanism. The proposed GALS has been tested on both computer-generated and real-world networks, and compared with some competitive community mining algorithms. Experimental result has shown that GALS is hig y effective and efficient for discovering community structure.

Jesus P, Moreno CB, Almeida PS.  2010.  Fault-Tolerant Aggregation for Dynamic Networks. 29th Symposium on Reliable Distributed Systems (SRDS). :37–43. Abstractfu_dynamic_agg.pdf

Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches, commonly designated gossip-based, are an important class of aggregation algorithms as they allow all nodes to produce a result, converge to any required accuracy, and work independently from the network topology. However, existing approaches exhibit many dependability issues when used in faulty and dynamic environments. This paper extends our own technique, Flow Updating, which is immune to message loss, to operate in dynamic networks, improving its fault tolerance characteristics. Experimental results show that the novel version of Flow Updating vastly outperforms previous averaging algorithms, it self adapts to churn without requiring any periodic restart, supporting node crashes and high levels of message loss.

Moreno CB, Almeida PS, Menezes R.  2009.  Fast estimation of aggregates in unstructured networks. Fifth International Conference on Autonomic and Autonomous Systems - ICAS. :88–93. Abstractieeefastfinalicas2009.pdf

Aggregation of data values plays an important role on distributed computations, in particular over peer-to-peer and sensor networks, as it can provide a summary of some global system property and direct the actions of self-adaptive distributed algorithms. Examples include using estimates of the network size to dimension distributed hash tables or estimates of the average system load to direct loadbalancing. Distributed aggregation using non-idempotent functions, like sums, is not trivial as it is not easy to prevent a given value from being accounted for multiple times; this is especially the case if no centralized algorithms or global identifiers can be used.This paper introduces Extrema Propagation, a probabilistic technique for distributed estimation of the sum of positive real numbers. The technique relies on the exchange of duplicate insensitive messages and can be applied in flood and/or epidemic settings, where multi-path routing occurs; it is tolerant of message loss; it is fast, as the number of message exchange steps equals the diameter; and it is fully distributed, with no single point of failure and the result produced at every node.