Recent Publications

Shapiro M, Preguiça N, Moreno CB, Zawirsky M.  2011.  Conflict-free replicated data types. Stabilization, Safety, and Security of Distributed Systems. :386–400. Abstractrr-7687.pdf

Replicating data under Eventual Consistency (EC) allows any replica to accept updates without remote synchronisation. This ensures performance and scalability in largescale distributed systems (e.g., clouds). However, published EC approaches are ad-hoc and error-prone. Under a formal Strong Eventual Consistency (SEC) model, we study sufficient conditions for convergence. A data type that satisfies these conditions is called a Conflictfree Replicated Data Type (CRDT). Replicas of any CRDT are guaranteed to converge in a self-stabilising manner, despite any number of failures. This paper formalises two popular approaches (state- and operation-based) and their relevant sufficient conditions. We study a number of useful CRDTs, such as sets with clean semantics, supporting both add and remove operations, and consider in depth the more complex Graph data type. CRDT types can be composed to develop large-scale distributed applications, and have interesting theoretical properties.

Almeida PS, Moreno CB, Farach-Colton M, Jesus P, Mosteiro M.  2011.  Fault-Tolerant Aggregation: Flow-Updating Meets Mass-Distribution. :513-527. Abstractmdfu.pdf

Flow-Updating (FU) is a fault-tolerant technique that has proved to be efficient in practice for the distributed computation of aggregate functions in communication networks where individual processors do not have access to global information. Previous distributed aggregation protocols, based on repeated sharing of input values (or mass) among processors, sometimes called Mass-Distribution (MD) protocols, are not resilient to communication failures (or message loss) because such failures yield a loss of mass. In this paper, we present a protocol which we call Mass-Distribution with Flow-Updating (MDFU). We obtain MDFU by applying FU techniques to classic MD. We analyze the convergence time of MDFU showing that stochastic message loss produces low overhead. This is the first convergence proof of an FU-based algorithm. We evaluate MDFU experimentally, comparing it with previous MD and FU protocols, and verifying the behavior predicted by the analysis. Finally, given that MDFU incurs a fixed deviation proportional to the message-loss rate, we adjust the accuracy of MDFU heuristically in a new protocol called MDFU with Linear Prediction (MDFU-LP). The evaluation shows that both MDFU and MDFU-LP behave very well in practice, even under high rates of message loss and even changing the input values dynamically.

Jesus P, Moreno CB, Almeida PS.  2011.  A Survey of Distributed Data Aggregation Algorithms. Arxiv preprint arXiv:1110.0725. :45. Abstract1110.0725.pdf

Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting values result from the distributed computation of functions like COUNT, SUM and AVERAGE. Some application examples can found to determine the network size, total storage capacity, average load, majorities and many others.
In the last decade, many different approaches have been proposed, with different trade-offs in terms of accuracy, reliability, message and time complexity. Due to the considerable amount and variety of aggregation algorithms, it can be difficult and time consuming to determine which techniques will be more appropriate to use in specific settings, justifying the existence of a survey to aid in this task.
This work reviews the state of the art on distributed data aggregation algorithms, providing three main contributions. First, it formally defines the concept of aggregation, characterizing the different types of aggregation functions. Second, it succinctly describes the main aggregation techniques, organizing them in a taxonomy. Finally, it provides some guidelines toward the selection and use of the most relevant techniques, summarizing their principal characteristics.

Macedo N, Cunha A.  2011.  Automatic unbounded verification of Alloy specifications with Prover 9. CoRR. abs/1209.5773:17. Abstract1209.5773v1.pdf

Alloy is an increasingly popular lightweight specification language based on relational logic. Alloy models can be automatically verified within a bounded scope using off-the-shelf SAT solvers. Since false assertions can usually be disproved using small counter-examples, this approach suffices for most applications. Unfortunately, it can sometimes lead to a false sense of security, and in critical applications a more traditional unbounded proof may be required. The automatic theorem prover Prover9 has been shown to be particularly effective for proving theorems of relation algebras, a quantifier-free (or point-free) axiomatization of a fragment of relational logic. In this paper we propose a translation from Alloy specifications to fork algebras (an extension of relation algebras with the same expressive power as relational logic) which enables their unbounded verification in Prover9. This translation covers not only logic assertions, but also the structural aspects (namely type declarations), and was successfully implemented and applied to several examples.

Oliveira JN.  2011.  Pointfree Foundations for (Generic) Lossless Decomposition. :64. Abstracthaslab-tr201103.pdf

This report presents a typed, “pointfree” generalization of relational data depen- dency theory expressed not in the standard set-theoretic way, “a` la Codd”, but in the calculus of binary relations which, initiated by De Morgan in the 1860s, is the core of modern algebra of programming.
Contrary to the intuition that a binary relation is just a particular case of an n- ary relation, this report shows the effectiveness of the former in “explaining” and reasoning about the latter. Data dependency theory, which is central to relational database design, is addressed in pointfree calculational style instead of reasoning about (sets of) tuples in conventional “implication-first” logic style.
It turns out that the theory becomes more general, more structured and sim- pler. Elegant expressions replace lengthy formulæ and easy-to-follow calculations replace pointwise proofs with lots of “· · ·” notation, case analysis and natural lan- guage explanations for “obvious” steps. In particular, attributes are generalized to arbitrary (observation) functions and the principle of lossless decomposition is established for arbitrary such functions.
The report concludes by showing how the proposed generalization of data dependency theory paves the way to interesting synergies with other branches of computer science, namely formal modeling and transition systems theory. A number of open topics for research in the field are proposed as future work.

Bonchi F, Bonsangue M, Rutten J, Silva A.  2011.  Brzozowski's algorithm (co)algebraically. Software Engineering [SEN]. :1-13. Abstract18796d.pdf

We give a new presentation of Brzozowski's algorithm to minimize finite automata, using elementary facts from universal algebra and coalgebra, and building on earlier work by Arbib and Manes on the duality between reachability and observability. This leads to a simple proof of its correctness and opens the door to further generalizations.

Kozen D, Silva A.  2011.  Left-handed completeness. Computing and Information Science Technical Reports. :1-18. Abstractleft_1.pdf

We give a new, significantly shorter proof of the completeness of the left-handed star rule of Kleene algebra. The proof reveals the rich interaction of algebra and coalgebra in the theory.

Kozen D, Silva A.  2011.  On Moessner's theorem. :1-8. Abstractpascal.pdf

Moessner's theorem describes a procedure for generating a sequence of n integer sequences that lead unexpectedly to the sequence of nth powers 1n, 2n, 3n, ... Paasche's theorem is a generalization of Moessner's; by varying the parameters of the procedure, one can obtain the sequence of factorials 1!, 2!, 3!, ... or the sequence of superfactorials 1!!, 2!!, 3!!, ... Long's theorem generalizes Moessner's in another direction, providing a procedure to generate the sequence a1n-1, (a+d)2n-1, (a+2d)3n-1, ... Proofs of these results in the literature are typically based on combinatorics of binomial coefficients or calculational scans. In this note we give a short and revealing algebraic proof of a general theorem that contains Moessner's, Paasche's, and Long's as special cases. We also prove a generalization that gives new Moessner-type theorems.

Bonsangue M, Caltais G, Goriac E, Lucanu D, Rutten J, Silva A.  2011.  Automatic equivalence proofs for non-deterministic coalgebras (revised and extended). :1-38. Abstract18793d.pdf

A notion of generalized regular expressions for a large class of systems modeled as coalgebras, and an analogue of Kleene’s theorem and Kleene algebra, were recently proposed by a subset of the authors of this paper. Examples of the systems covered include infinite streams, deterministic automata, Mealy machines and labelled transition sytems. In this paper, we present a novel algorithm to decide whether two expressions are bisimilar or not. The procedure is implemented in the automatic theorem prover CIRC, by reducing coinduction to an entailment relation between an algebraic specification and an appropriate set of equations. We illustrate the generality of the tool with three examples: infinite streams of real numbers, Mealy machines and labelled transition systems.

Bonchi F, Bonsangue M, Boreale M, Rutten J, Silva A.  2011.  A coalgebraic perspective on linear weighted automata. :1-31. Abstract18069a.pdf

Weighted automata are a generalization of non-deterministic automata where each transition, in addition to an input letter, has also a quantity expressing the weight (e.g. cost or probability) of its execution. As for non-deterministic automata, their behaviours can be expressed in terms of either (weighted) bisimilarity or (weighted) language equivalence.

Coalgebras provide a categorical framework for the uniform study of state-based systems and their behaviours.
In this work, we show that coalgebras can suitably model weighted automata in two different ways: coalgebras on Set (the category of sets and functions) characterize weighted bisimilarity, while coalgebras on Vect (the category of vector spaces and linear maps) characterize weighted language equivalence.

Relying on the second characterization, we show three different procedures for computing weighted language equivalence. The first one consists in a generalization of the usual partition refinement algorithm for ordinary automata. The second one is the backward version of the first one. The third procedure relies on a syntactic representation of rational weighted languages.

Bonchi F, Hülsbusch M, König B, Silva A.  2011.  A coalgebraic perspective on minimization, determinization and behavioural metrics. :1-23. Abstract18079d_1.pdf

Coalgebra offers a unified theory of state based systems, including infinite streams, labelled transition systems and deterministic automata. In this paper, we use the coalgebraic view on systems to derive, in a uniform way, abstract procedures for checking behavioural equivalence in coalgebras, which perform (a combination of) minimization and determinization in the system. First, we show that for coalgebras on categories equipped with factorization structures, there exists an abstract procedure for equivalence checking. For instance, when considering epi-mono factorizations in the category of sets and functions, this procedure corresponds to the usual (coalgebraic) minimization algorithm and two states are behaviourally equivalent iff they are mapped to the same state in the minimized coalgebra. Second, motivated by several examples, we consider coalgebras on categories without suitable factorization structures: under certain conditions, it is possible to apply the above procedure after transforming coalgebras with reflections. This transformation can be thought of as some kind of determinization. Finally, we show that the result of the procedure also induces a pseudo-metric on the states, in such a way that the distance between each pair of states is minimized.

Silva A.  2011.  A specification language for Reo connectors. :1-17. Abstract18033d.pdf

Recent approaches to component-based software engineering employ coordinating connectors to compose components into software systems. Reo is a model of component coordination, wherein complex connectors are constructed by composing various types of primitive channels. Reo automata are a simple and intuitive formal model of context-dependent connectors, which provided a compositional semantics for Reo. In this paper, we study Reo automata from a coalgebraic perspective. This enables us to apply the recently developed coalgebraic theory of generalized regular expressions in order to derive a specification language, tailor-made for Reo automata, and sound and complete axiomatizations with respect to three distinct notions of equivalence: (coalgebraic) bisimilarity, the bisimulation notion studied in the original papers on Reo automata and trace equivalence. The obtained language is simple, but nonetheless expressive enough to specify all possible finite Reo automata. Moreover, it comes equipped with a Kleene-like theorem: we provide algorithms to translate expressions to Reo automata and, conversely, to compute an expression equivalent to a state in a Reo automaton.

Beckwith L, Cunha J, Fernandes JP, Saraiva JA.  2010.  End Users Productivity in Model-based Spreadsheets: An Empirical Study. Abstracttr_study.pdf

Spreadsheets are widely used by end users, and studies have shown that most end-user spreadsheets contain non-trivial errors. To improve end users productivity, recent research proposes the use of a model-driven engineering approach to spreadsheets. In this paper we conduct the first systematic empirical study to assess the effectiveness and efficiency of this approach. A set of spreadsheet end users worked with two different model-based spreadsheets, and we present and analyze the results achieved.

Cunha J, Visser J, Alves T, Saraiva JA.  2010.  Type-safe Evolution of Spreadsheets. Abstracttr_evolution.pdf

Spreadsheets are notoriously error-prone. To help avoid the introduction of errors when changing spreadsheets, models that capture the structure and interdependencies of spreadsheets at a conceptual level have been proposed. Thus, spreadsheet evolution can be made safe within the confines of a model. As in any other model/instance setting, evolution may not only require changes at the instance level but also at the model level. When model changes are required, the safety of instance evolution can not be guarded by the model alone. Coupled transformation of models and instances are supported by the 2LT platform and have been applied for transformation of algebraic datatypes, XML schemas, and relational database models. We have extended 2LT to spreadsheet evolution. We have designed an appropriate representation of spreadsheet models, including the fundamental notions of formulæ, references, and blocks of cells. For these models and their instances, we have designed coupled transformation rules that cover specific spreadsheet evolution steps, such as extraction of a block of cells into a separate sheet or insertion of columns in all occurrences of a repeated block of cells. Each model-level transformation rule is coupled with instance level migration rules from the source to the target model and vice versa. These coupled rules can be composed to create compound transformations at the model level that induce compound transformations at the instance level. With this approach, spreadsheet evolution can be made safe, even when model changes are involved.

Preguiça N, Moreno CB, Almeida PS, Fonte V, Gonçalves R.  2010.  Dotted Version Vectors: Logical Clocks for Optimistic Replication. CoRR. 1011.5808 Abstractdvv-arxiv.pdf

In cloud computing environments, a large number of users access data stored in highly available storage systems. To provide good performance to geographically disperse users and allow operation even in the presence of failures or network partitions, these systems often rely on optimistic replication solutions that guarantee only eventual consistency. In this scenario, it is important to be able to accurately and efficiently identify updates executed concurrently. In this paper, first we review, and expose problems with current approaches to causality tracking in optimistic replication: these either lose information about causality or do not scale, as they require replicas to maintain information that grows linearly with the number of clients or updates.Then, we propose a novel solution that fully captures causality while being very concise in that it maintains information that grows linearly only with the number of servers that register updates for a given data element, bounded by the degree of replication.