Recent Publications

Murta D, Oliveira JN.  2013.  Calculating risk in functional programming. :38. Abstract1311.3687v1.pdf

In the trend towards tolerating hardware unreliability, accuracy is exchanged for cost savings. Running on less reliable machines, "functionally correct" code becomes risky and one needs to know how risk propagates so as to mitigate it. Risk estimation, however, seems to live outside the average programmer's technical competence and core practice. In this paper we propose that risk be constructively handled in functional programming by (a) writing programs which may choose between expected and faulty behaviour, and by (b) reasoning about them in a linear algebra extension to standard, a la Bird-Moor algebra of programming. In particular, the propagation of faults across standard program transformation techniques known as tupling and fusion is calculated, enabling the fault of the whole to be expressed in terms of the faults of its parts.

Riboira A, Abreu R.  2013.  END-A Lightweight Algorithm to Estimate the Number of Defects in Software. :8. Abstract955913_1.pdf

Defect precision provides information on how many defects a given software application appears to have. Existing approaches are usually based on time consuming model-based techniques. A viable alternative is the previously presented Abacus algorithm, which is based on Bayesian fault diagnosis.This paper presents a novel alternative approach - coined End that uses the same input and produces the same output as the Abacus algorithm, but is considerably more time ecient. An experiment was conducted to compare both the accuracy and performance of these two algorithms. The End algorithm presented the same accuracy as the Abacus algorithm, but outperformed it in the majority of executions.

Bieniusa A, Zawirsky M, Preguiça N, Shapiro M, Moreno CB, Balegas V, Duarte S.  2012.  An optimized conflict-free replicated set. CoRR. 1210.3368(8083):9. Abstract1210.3368v1.pdf

Eventual consistency of replicated data supports concurrent updates, reduces latency and improves fault tolerance, but forgoes strong consistency. Accordingly, several cloud computing platforms implement eventually-consistent data types. The set is a widespread and useful abstraction, and many replicated set designs have been proposed. We present a reasoning abstraction, permutation equivalence, that systematizes the characterization of the expected concurrency semantics of concurrent types. Under this framework we present one of the existing conflict-free replicated data types, Observed-Remove Set. Furthermore, in order to decrease the size of meta-data, we propose a new optimization to avoid tombstones. This approach that can be transposed to other data types, such as maps, graphs or sequences.

Macedo N, Pacheco H, Cunha A.  2012.  Relations as executable specifications: taming partiality and non-determinism using invariants. :146-161. Abstractndlenses12tech.pdf

The calculus of relations has been widely used in program specification and reasoning. It is very tempting to use such specifications as running prototypes of the desired program, but, even considering finite domains, the inherent partiality and non-determinism of relations makes this impractical and highly inefficient. To tame partiality we prescribe the usage of invariants, represented by coreflexives, to characterize the exact domains and codomains of relational specifications. Such invariants can be used as pre-condition checkers to avoid runtime errors. Moreover, we show how such invariants can be used to narrow the non-deterministic execution of relational specifications, making it viable for a relevant class of problems. In particular, we show how the proposed techniques can be applied to execute specifications of bidirectional transformations, a domain where partiality and non-determinism are paramount.

Shoker A, Yabandeh M, Guerraoui R, Bahsoun JP.  2012.  Obfuscated BFT. :1-15. AbstractPaper

A major assumption underlies the correctness of BFT services: failure independence. However, state of the art protocols rely on inter-replica communication in order to preserve consistency, and they implicitly require replicas to store access information about each other. This jeopardizes the independence of failure assumption since intruders can sneak from one replica to another and compromises the entire service. In this paper, we leverage this issue by exploring the idea of obfuscation in the BFT context: the replicas of a BFT service are completely unaware of each others identities. We present a new client-based replication BFT protocol, called OBFT, that assumes honest, but possibly crash-prone clients. The protocol imposes equivalent load on replicas to avoid bottlenecks, and reduces the load on the primary replica by distributing multi-cast and cryptographic tasks among clients. We evaluate OBFT on an Emulab cluster with a wide area topology and convey its scalability with respect to state of the art BFT protocols.

Kozen D, Silva A.  2012.  Practical Coinduction. :1-22. Abstractstructural.pdf

Induction is a well-established proof principle that is taught in most undergraduate programs in mathematics and computer science. In computer science, it is used primarily to reason about inductively-defined datatypes such as finite lists, finite trees, and the natural numbers. Coinduction is the dual principle that can be used to reason about coinductive datatypes such as infinite streams or trees, but it is not as widespread or as well understood. In this paper, we illustrate through several examples the use of coinduction in informal mathematical arguments. Our aim is to promote the principle as a useful tool for the working mathematician and to bring it to a level of familiarity on par with induction. We show that coinduction is not only about bisimilarity and equality of behaviors, but also applicable to a variety of functions and relations defined on coinductive datatypes.

Jeannin J-B, Kozen D, Silva A.  2012.  CoCaml: Programming with Coinductive Types. :1-24. Abstractcocaml.pdf

We present CoCaml, a functional programming language extending OCaml, which allows us to define functions on coinductive datatypes parameterized by an equation solver. We provide numerous examples that attest to the usefulness of the new programming constructs, including operations on infinite lists, infinitary lambda-terms and p-adic numbers.

Jeannin J-B, Kozen D, Silva A.  2012.  Language constructs for non-well-founded computation. :1-12. Abstractnwf.pdf

Recursive functions defined on a coalgebraic datatype may not converge if there are cycles in the input, that is, if the input object is not well-founded. Even so, there is often a useful solution; for example, the free variables of an infinitary lambda-term, or the expected running time of a finite-state probabilistic protocol. Theoretical models of recursion schemes have been well studied under the names well-founded coalgebras, recursive coalgebras, corecursive algebras, and Elgot algebras. Much of this work focuses on conditions ensuring unique or canonical solutions, e.g.\ when the domain is well-founded. If the domain is not well-founded, there can be multiple solutions. The standard semantics of recursive programs gives a particular solution, namely the least solution in a flat Scott domain, which may not be the one we want. Unfortunately, current programming languages provide no support for specifying alternatives. In this paper we give numerous examples in which it would be useful to do so and propose programming language constructs that would allow the specification of alternative solutions and methods to compute them. The programmer must essentially provide a way to solve equations in the codomain. We show how to implement these new constructs as an extension of existing languages. We also prove some theoretical results characterizing well-founded coalgebras that slightly extend results of Adamek, Luecke, and Milius (2007).

Shapiro M, Preguiça N, Moreno CB, Zawirsky M, others.  2011.  A comprehensive study of convergent and commutative replicated data types. :47. Abstracttechreport.pdf

Eventual consistency aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufficient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs). This paper formalises asynchronous object replication, either state based or operation based, and provides a sufficient condition appropriate for each case. It describes several useful CRDTs, including container data types supporting both add and remove operations with clean semantics, and more complex types such as graphs, montonic DAGs,
and sequences. It discusses some properties needed to implement non-trivial CRDTs.

Shapiro M, Preguiça N, Moreno CB, Zawirsky M.  2011.  Conflict-free replicated data types. Stabilization, Safety, and Security of Distributed Systems. :386–400. Abstractrr-7687.pdf

Replicating data under Eventual Consistency (EC) allows any replica to accept updates without remote synchronisation. This ensures performance and scalability in largescale distributed systems (e.g., clouds). However, published EC approaches are ad-hoc and error-prone. Under a formal Strong Eventual Consistency (SEC) model, we study sufficient conditions for convergence. A data type that satisfies these conditions is called a Conflictfree Replicated Data Type (CRDT). Replicas of any CRDT are guaranteed to converge in a self-stabilising manner, despite any number of failures. This paper formalises two popular approaches (state- and operation-based) and their relevant sufficient conditions. We study a number of useful CRDTs, such as sets with clean semantics, supporting both add and remove operations, and consider in depth the more complex Graph data type. CRDT types can be composed to develop large-scale distributed applications, and have interesting theoretical properties.

Almeida PS, Moreno CB, Farach-Colton M, Jesus P, Mosteiro M.  2011.  Fault-Tolerant Aggregation: Flow-Updating Meets Mass-Distribution. :513-527. Abstractmdfu.pdf

Flow-Updating (FU) is a fault-tolerant technique that has proved to be efficient in practice for the distributed computation of aggregate functions in communication networks where individual processors do not have access to global information. Previous distributed aggregation protocols, based on repeated sharing of input values (or mass) among processors, sometimes called Mass-Distribution (MD) protocols, are not resilient to communication failures (or message loss) because such failures yield a loss of mass. In this paper, we present a protocol which we call Mass-Distribution with Flow-Updating (MDFU). We obtain MDFU by applying FU techniques to classic MD. We analyze the convergence time of MDFU showing that stochastic message loss produces low overhead. This is the first convergence proof of an FU-based algorithm. We evaluate MDFU experimentally, comparing it with previous MD and FU protocols, and verifying the behavior predicted by the analysis. Finally, given that MDFU incurs a fixed deviation proportional to the message-loss rate, we adjust the accuracy of MDFU heuristically in a new protocol called MDFU with Linear Prediction (MDFU-LP). The evaluation shows that both MDFU and MDFU-LP behave very well in practice, even under high rates of message loss and even changing the input values dynamically.

Jesus P, Moreno CB, Almeida PS.  2011.  A Survey of Distributed Data Aggregation Algorithms. Arxiv preprint arXiv:1110.0725. :45. Abstract1110.0725.pdf

Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting values result from the distributed computation of functions like COUNT, SUM and AVERAGE. Some application examples can found to determine the network size, total storage capacity, average load, majorities and many others.
In the last decade, many different approaches have been proposed, with different trade-offs in terms of accuracy, reliability, message and time complexity. Due to the considerable amount and variety of aggregation algorithms, it can be difficult and time consuming to determine which techniques will be more appropriate to use in specific settings, justifying the existence of a survey to aid in this task.
This work reviews the state of the art on distributed data aggregation algorithms, providing three main contributions. First, it formally defines the concept of aggregation, characterizing the different types of aggregation functions. Second, it succinctly describes the main aggregation techniques, organizing them in a taxonomy. Finally, it provides some guidelines toward the selection and use of the most relevant techniques, summarizing their principal characteristics.

Macedo N, Cunha A.  2011.  Automatic unbounded verification of Alloy specifications with Prover 9. CoRR. abs/1209.5773:17. Abstract1209.5773v1.pdf

Alloy is an increasingly popular lightweight specification language based on relational logic. Alloy models can be automatically verified within a bounded scope using off-the-shelf SAT solvers. Since false assertions can usually be disproved using small counter-examples, this approach suffices for most applications. Unfortunately, it can sometimes lead to a false sense of security, and in critical applications a more traditional unbounded proof may be required. The automatic theorem prover Prover9 has been shown to be particularly effective for proving theorems of relation algebras, a quantifier-free (or point-free) axiomatization of a fragment of relational logic. In this paper we propose a translation from Alloy specifications to fork algebras (an extension of relation algebras with the same expressive power as relational logic) which enables their unbounded verification in Prover9. This translation covers not only logic assertions, but also the structural aspects (namely type declarations), and was successfully implemented and applied to several examples.

Oliveira JN.  2011.  Pointfree Foundations for (Generic) Lossless Decomposition. :64. Abstracthaslab-tr201103.pdf

This report presents a typed, “pointfree” generalization of relational data depen- dency theory expressed not in the standard set-theoretic way, “a` la Codd”, but in the calculus of binary relations which, initiated by De Morgan in the 1860s, is the core of modern algebra of programming.
Contrary to the intuition that a binary relation is just a particular case of an n- ary relation, this report shows the effectiveness of the former in “explaining” and reasoning about the latter. Data dependency theory, which is central to relational database design, is addressed in pointfree calculational style instead of reasoning about (sets of) tuples in conventional “implication-first” logic style.
It turns out that the theory becomes more general, more structured and sim- pler. Elegant expressions replace lengthy formulæ and easy-to-follow calculations replace pointwise proofs with lots of “· · ·” notation, case analysis and natural lan- guage explanations for “obvious” steps. In particular, attributes are generalized to arbitrary (observation) functions and the principle of lossless decomposition is established for arbitrary such functions.
The report concludes by showing how the proposed generalization of data dependency theory paves the way to interesting synergies with other branches of computer science, namely formal modeling and transition systems theory. A number of open topics for research in the field are proposed as future work.

Bonchi F, Bonsangue M, Rutten J, Silva A.  2011.  Brzozowski's algorithm (co)algebraically. Software Engineering [SEN]. :1-13. Abstract18796d.pdf

We give a new presentation of Brzozowski's algorithm to minimize finite automata, using elementary facts from universal algebra and coalgebra, and building on earlier work by Arbib and Manes on the duality between reachability and observability. This leads to a simple proof of its correctness and opens the door to further generalizations.