Reports

Almeida PS, Shoker A, Moreno CB.  2015.  Efficient State-based CRDTs by Delta-Mutation. CoRR. abs/1410.2803:1-19. Abstractdelta-tr.pdf

CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs achieve this by sharing local state changes through shipping the entire state, that is then merged to other replicas with an idempotent, associative, and commutative join operation, ensuring convergence. This imposes a large communication overhead as the state size becomes larger. We introduce Delta State Conflict-Free Replicated Datatypes ({\delta}-CRDT), which make use of {\delta}-mutators, defined in such a way to return a delta-state, typically, with a much smaller size than the full state. Delta-states are joined to the local state as well as to the remote states (after being shipped). This can achieve the best of both worlds: small messages with an incremental nature, as in operation-based CRDTs, disseminated over unreliable communication channels, as in traditional state-based CRDTs. We introduce the {\delta}-CRDT framework, and we explain it through establishing a correspondence to current state- based CRDTs. In addition, we present two anti-entropy algorithms: a basic one that provides eventual convergence, and another one that ensures both convergence and causal consistency. We also introduce two {\delta}-CRDT specifications of well-known replicated datatypes.

Moreno CB, Almeida PS, Cunha A, Ferreira C.  2015.  Composition of State-based CRDTs. Abstractcrdtcompositionreport.pdf

State-based CRDTs are rooted in mathematical structures called join-semilattices (more simply lattices, in this context). These order structures ensure that the replicated states of the defined data types evolve and increase in a partial order in a sufficiently defined way, so as to ensure that all concurrent evolutions can be merged deterministically. In order to build, or understand the building principles, of state-based CRDTs it is necessary to understand the basic building blocks of the support lattices and how lattices can be composed.

Cunha A, Jorge T, Macedo N.  2015.  A Feature-based Classification of Model Repair Approaches. Abstract1504.03947v1.pdf

Consistency management, the ability to detect, diagnose and handle inconsistencies, is crucial during the development process in Model-driven Engineering (MDE). As the popularity and application scenarios of MDE expanded, a variety of different techniques were proposed to address these tasks in specific contexts. Of the various stages of consistency management, this work focuses on inconsistency fixing in MDE, where such task is embodied by model repair techniques. This paper proposes a feature-based classification system for model repair techniques, based on an systematic review of previously proposed approaches. We expect this work to assist both the developers of novel techniques and the MDE practitioners looking for suitable solutions.

Barbosa LS, Neves R, Martins MA, Hofmann D.  2015.  Continuity as a computational effect. Abstract1507.03219v1.pdf

The original purpose of component–based development was to provide techniques to master complex software, through composition, reuse and parametrisation. However, such systems are rapidly moving towards a level in which software becomes prevalently intertwined with (continuous) physical processes. A possible way to accommodate the latter in component calculi relies on a suitable encoding of continuous behaviour as (yet another) computational effect. This paper introduces such an encoding through a monad which, in the compositional development of hybrid systems, may play a role similar to the one played by the 1+, powerset, and distribution monads in the characterisation of partial, non deterministic and probabilistic components, respectively. This monad and its Kleisli category provide a setting in which the effects of continuity over (different forms of) composition can be suitably studied.

Saraiva JA, Maia P, Mendes J, Cunha J, Rebêlo H, Saraiva JA.  2015.  Towards the Design and Implementation of Aspect-Oriented Programming for Spreadsheets. Abstract1503.03463v1_2.pdf

A spreadsheet usually starts as a simple and singleuser software artifact, but, as frequent as in other software systems, quickly evolves into a complex system developed by many actors. Often, different users work on different aspects of the same spreadsheet: while a secretary may be only involved in adding plain data to the spreadsheet, an accountant may define new business rules, while an engineer may need to adapt the spreadsheet content so it can be used by other software systems. Unfortunately, spreadsheet systems do not offer modular mechanisms, and as a consequence, some of the previous tasks may be defined by adding intrusive “code” to the spreadsheet. In this paper we go through the design and implementation of an aspect-oriented language for spreadsheets so that users can work on different aspects of a spreadsheet in a modular way. For example, aspects can be defined in order to introduce new business rules to an existing spreadsheet, or to manipulate the spreadsheet data to be ported to another system. Aspects are defined as aspect-oriented program specifications that are dynamically woven into the underlying spreadsheet by an aspect weaver. In this aspect-oriented style of spreadsheet development, different users develop, or reuse, aspects without adding intrusive code to the original spreadsheet. Such code is added/executed by the spreadsheet weaving mechanism proposed in this paper.

Saraiva JA, Pereira R, Fernandes JP, Cunha J.  2015.  Querying Spreadsheets: An Empirical Study. Abstract1502.07948.pdf

One of the most important assets of any company is being able to easily access information on itself and on its business. In this line, it has been observed that this important information is often stored in one of the millions of spreadsheets created every year, due to simplicity in using and manipulating such an artifact. Unfortunately, in many cases it is quite difficult to retrieve the intended information from a spreadsheet: information is often stored in a huge unstructured matrix, with no care for readability or comprehensiveness. In an attempt to aid users in the task of extracting information from a spreadsheet, researchers have been working on models, languages and tools to query. In this paper we present an empirical study evaluating such proposals assessing their usage to query spreadsheets. We investigate the use of the Google Query Function, textual modeldriven querying, and visual model-driven querying. To compare these different querying approaches we present an empirical study whose results show that the end-users' productivity increases when using model-driven queries, specially using its visual representation.

Almeida JB, Barbosa MB, Barthe G, Davy G, Dupressoir F, Grégoire B, Strub P-Y.  2014.  Verified Implementations for Secure and Verifiable Computation. IACR Cryptology ePrint Archive. 456 Abstract456.pdf

Formal verification of the security of software systems is gradually moving from the traditional focus on idealized models, to the more ambitious goal of producing verified implementations. This trend is also present in recent work targeting the verification of cryptographic software, but the reach of existing tools has so far been limited to cryptographic primitives, such as RSA-OAEP encryption, or standalone protocols, such as SSH. This paper presents a scalable approach to formally verifying implementations of higherlevel cryptographic systems, directly in the computational model. We consider circuit-based cloud-oriented cryptographic protocols for secure and verifiable computation over encrypted data. Our examples share as central component Yao’s celebrated transformation of a boolean circuit into an equivalent “garbled” form that can be evaluated securely in an untrusted environment. We leverage the foundations of garbled circuits set forth by Bellare, Hoang, and Rogaway (CCS 2012, ASIACRYPT 2012) to build verified implementations of garbling schemes, a verified implementation of Yao’s secure function evaluation protocol, and a verified (albeit partial) implementation of the verifiable computation protocol by Gennaro, Gentry, and Parno (CRYPTO 2010). The implementations are formally verified using EasyCrypt, a tool-assisted framework for building high-confidence cryptographic proofs, and critically rely on two novel features: a module and theory system that supports compositional reasoning, and a code extraction mechanism for generating implementations from formalizations.

Silva A, SergeyGoncharov, Milius S.  2014.  Towards a Coalgebraic Chomsky Hierarchy. Abstract

The Chomsky hierarchy plays a prominent role in the foundations of the theoretical computer science relating classes of formal languages of primary importance. In this paper we use recent developments on coalgebraic and monad-based semantics to obtain a generic notion of a T-automaton, where T is a monad, which allows the uniform study of various notions of machines (e.g. finite state machines, multi-stack machines, Turing machines, valence automata, weighted automata). We use the generalized powerset construction to define a generic (trace) semantics for T-automata, and we show by numerous examples that it correctly instantiates for the known classes of machines/languages captured by the Chomsky hierarchy. Moreover, our approach provides new generic techniques for proving expressivity bounds of various machine-based models.

Silva A, Bonchi F, Milius S, Zanasi F.  2014.  How to Kill Epsilons with a Dagger - A Coalgebraic Take on Systems with Algebraic Label Structure. Abstract

We propose an abstract framework for modeling state-based systems with internal behavior as e.g. given by silent or ϵ-transitions. Our approach employs monads with a parametrized fixpoint operator † to give a semantics to those systems and implement a sound procedure of abstraction of the internal transitions, whose labels are seen as the unit of a free monoid. More broadly, our approach extends the standard coalgebraic framework for state-based systems by taking into account the algebraic structure of the labels of their transitions. This allows to consider a wide range of other examples, including Mazurkiewicz traces for concurrent systems.

Campos F, Pereira JO.  2014.  Improving the Scalability oWS- Based Networked Infrastructures. 1:1-28. Abstract1407.8546.pdf

The Devices Profile for Web Services (DPWS) specification enables seamless discovery, configuration, and interoperability of networked devices in various settings, ranging from home automation and multimedia to manufacturing equipment and data centers. Unfortunately, the sheer simplicity of event notification mechanisms that makes it fit for resource-constrained devices, makes it hard to scale to large infrastructures with more stringent dependability requirements, ironically, where self-configuration would be most useful. In this report, we address this challenge with a proposal to integrate gossip-based dissemination in DPWS, thus maintaining compatibility with original assumptions of the specification, and avoiding a centralized configuration server or custom black-box middleware components. In detail, we show how our approach provides an evolutionary and non-intrusive solution to the scalability limitations of DPWS and experimentally evaluate it with an implementation based on the the Web Services for Devices (WS4D) Java Multi Edition DPWS Stack (JMEDS).

Pacheco H, Macedo N, Cunha A, Voigtländer J.  2013.  A Generic Scheme and Properties of Bidirectional Transformations. CoRR. arXiv/1306.4473:19. Abstract1306.4473v2.pdf

The recent rise of interest in bidirectional transformations (BXs) has led to the development of many BX frameworks, originating in diverse computer science disciplines. From a user perspective, these frameworks vary significantly in both interface and predictability of the underlying bidirectionalization technique. In this paper we start by presenting a generic BX scheme that can be instantiated to different concrete interfaces, by plugging-in the desired notion of update and traceability. Based on that scheme, we then present several desirable generic properties that may characterize a BX framework, and show how they can be instantiated to concrete interfaces. This generic presentation is useful when exploring the BX design space: it might help developers when designing new frameworks and end-users when comparing existing ones. We support the latter claim, by applying it in a comparative survey of popular existing BX frameworks.

Shoker A, Bahsoun JP.  2013.  BFT Selection. :1-15. AbstractPaper

One-size-fits-all protocols are hard to achieve in Byzantine fault tolerance (BFT). As an alternative, BFT users, e.g., enterprises, need an easy and efficient method to choose the most convenient protocol that matches their preferences best. The various BFT protocols that have been proposed so far differ significantly in their characteristics and performance which makes choosing the ‘preferred’ protocol hard. In addition, if the state of the deployed system is too fluctuating, then perhaps using multiple protocols at once is needed; this requires a dynamic selection mechanism to move from one protocol to another. In this paper, we present the first BFT selection model and algorithm that can be used to choose the most convenient protocol according to user preferences. The selection algorithm applies some mathematical formulas to make the selection process easy and automatic. The algorithm operates in three modes: Static, Dynamic, and Heuristic. The Static mode addresses the cases where a single protocol is needed; the Dynamic mode assumes that the system conditions are quite ?uctuating and thus requires run-time decisions, and the Heuristic mode is similar to the Dynamic mode but it uses additional heuristics to improve user choices. We give some examples to describe how selection occurs. We show that our approach is automated, easy, and yields reasonable results that match reality. To the best of our knowledge, this is the first work that addresses selection in BFT.

Jin D, Yang B, Moreno CB, Liu D, He D, Liu J.  2013.  Markov random walk under constraint for discovering overlapping communities in complex networks. :21. Abstract1303.5675.pdf

The detection of overlapping communities in complex networks has motivated recent research in relevant fields. Aiming to address this problem, we propose a Markov-dynamics-based algorithm, called UEOC, which means 'unfold and extract overlapping communities'. In UEOC, when identifying each natural community that overlaps, a Markov random walk method combined with a constraint strategy, which is based on the corresponding annealed network (degree conserving random network), is performed to unfold the community. Then, a cutoff criterion with the aid of a local community function, called conductance, which can be thought of as the ratio between the number of edges inside the community and those leaving it, is presented to extract this emerged community from the entire network. The UEOC algorithm depends on only one parameter whose value can be easily set, and it requires no prior knowledge of the hidden community structures. The proposed UEOC has been evaluated both on synthetic benchmarks and on some real-world networks, and has been compared with a set of competing algorithms. The experimental result has shown that UEOC is highly effective and efficient for discovering overlapping communities.

Zawirsky M, Bieniusa A, Balegas V, Duarte S, Moreno CB, Shapiro M, Preguiça N.  2013.  SwiftCloud: Fault-tolerant geo-replication integrated all the way to the client machine. :24. Abstract1310.3107v1.pdf

Client-side logic and storage are increasingly used in web and mobile applications to improve response time and availability. Current approaches tend to be ad-hoc and poorly integrated with the server-side logic. We present a principled approach to integrate client- and server-side storage. We support mergeable and strongly consistent transactions that target either client or server replicas and provide access to causally-consistent snapshots efficiently. In the presence of infrastructure faults, a client-assisted failover solution allows client execution to resume immediately and seamlessly access consistent snapshots without waiting. We implement this approach in SwiftCloud, the first transactional system to bring geo-replication all the way to the client machine. Example applications show that our programming model is useful across a range of application areas. Our experimental evaluation shows that SwiftCloud provides better fault tolerance and at the same time can improve both latency and throughput by up to an order of magnitude, compared to classical geo-replication techniques.

Almeida PS, Moreno CB.  2013.  Scalable eventually consistent counters over unreliable networks. :1-32. Abstract1307.3207v1_1.pdf

Counters are an important abstraction in distributed computing, and play a central role in large scale geo-replicated systems, counting events such as web page impressions or social network "likes". Classic distributed counters, strongly consistent, cannot be made both available and partition-tolerant, due to the CAP Theorem, being unsuitable to large scale scenarios. This paper defines Eventually Consistent Distributed Counters (ECDC) and presents an implementation of the concept, Handoff Counters, that is scalable and works over unreliable networks. By giving up the sequencer aspect of classic distributed counters, ECDC implementations can be made AP in the CAP design space, while retaining the essence of counting. Handoff Counters are the first CRDT (Conflict-free Replicated Data Type) based mechanism that overcomes the identity explosion problem in naive CRDTs, such as G-Counters (where state size is linear in the number of independent actors that ever incremented the counter), by managing identities towards avoiding global propagation and garbage collecting temporary entries. The approach used in Handoff Counters is not restricted to counters, being more generally applicable to other data types with associative and commutative operations.

Murta D, Oliveira JN.  2013.  Calculating risk in functional programming. :38. Abstract1311.3687v1.pdf

In the trend towards tolerating hardware unreliability, accuracy is exchanged for cost savings. Running on less reliable machines, "functionally correct" code becomes risky and one needs to know how risk propagates so as to mitigate it. Risk estimation, however, seems to live outside the average programmer's technical competence and core practice. In this paper we propose that risk be constructively handled in functional programming by (a) writing programs which may choose between expected and faulty behaviour, and by (b) reasoning about them in a linear algebra extension to standard, a la Bird-Moor algebra of programming. In particular, the propagation of faults across standard program transformation techniques known as tupling and fusion is calculated, enabling the fault of the whole to be expressed in terms of the faults of its parts.

Riboira A, Abreu R.  2013.  END-A Lightweight Algorithm to Estimate the Number of Defects in Software. :8. Abstract955913_1.pdf

Defect precision provides information on how many defects a given software application appears to have. Existing approaches are usually based on time consuming model-based techniques. A viable alternative is the previously presented Abacus algorithm, which is based on Bayesian fault diagnosis.This paper presents a novel alternative approach - coined End that uses the same input and produces the same output as the Abacus algorithm, but is considerably more time ecient. An experiment was conducted to compare both the accuracy and performance of these two algorithms. The End algorithm presented the same accuracy as the Abacus algorithm, but outperformed it in the majority of executions.

Bieniusa A, Zawirsky M, Preguiça N, Shapiro M, Moreno CB, Balegas V, Duarte S.  2012.  An optimized conflict-free replicated set. CoRR. 1210.3368(8083):9. Abstract1210.3368v1.pdf

Eventual consistency of replicated data supports concurrent updates, reduces latency and improves fault tolerance, but forgoes strong consistency. Accordingly, several cloud computing platforms implement eventually-consistent data types. The set is a widespread and useful abstraction, and many replicated set designs have been proposed. We present a reasoning abstraction, permutation equivalence, that systematizes the characterization of the expected concurrency semantics of concurrent types. Under this framework we present one of the existing conflict-free replicated data types, Observed-Remove Set. Furthermore, in order to decrease the size of meta-data, we propose a new optimization to avoid tombstones. This approach that can be transposed to other data types, such as maps, graphs or sequences.

Macedo N, Pacheco H, Cunha A.  2012.  Relations as executable specifications: taming partiality and non-determinism using invariants. :146-161. Abstractndlenses12tech.pdf

The calculus of relations has been widely used in program specification and reasoning. It is very tempting to use such specifications as running prototypes of the desired program, but, even considering finite domains, the inherent partiality and non-determinism of relations makes this impractical and highly inefficient. To tame partiality we prescribe the usage of invariants, represented by coreflexives, to characterize the exact domains and codomains of relational specifications. Such invariants can be used as pre-condition checkers to avoid runtime errors. Moreover, we show how such invariants can be used to narrow the non-deterministic execution of relational specifications, making it viable for a relevant class of problems. In particular, we show how the proposed techniques can be applied to execute specifications of bidirectional transformations, a domain where partiality and non-determinism are paramount.

Shoker A, Yabandeh M, Guerraoui R, Bahsoun JP.  2012.  Obfuscated BFT. :1-15. AbstractPaper

A major assumption underlies the correctness of BFT services: failure independence. However, state of the art protocols rely on inter-replica communication in order to preserve consistency, and they implicitly require replicas to store access information about each other. This jeopardizes the independence of failure assumption since intruders can sneak from one replica to another and compromises the entire service. In this paper, we leverage this issue by exploring the idea of obfuscation in the BFT context: the replicas of a BFT service are completely unaware of each others identities. We present a new client-based replication BFT protocol, called OBFT, that assumes honest, but possibly crash-prone clients. The protocol imposes equivalent load on replicas to avoid bottlenecks, and reduces the load on the primary replica by distributing multi-cast and cryptographic tasks among clients. We evaluate OBFT on an Emulab cluster with a wide area topology and convey its scalability with respect to state of the art BFT protocols.

Jeannin J-B, Kozen D, Silva A.  2012.  CoCaml: Programming with Coinductive Types. :1-24. Abstractcocaml.pdf

We present CoCaml, a functional programming language extending OCaml, which allows us to define functions on coinductive datatypes parameterized by an equation solver. We provide numerous examples that attest to the usefulness of the new programming constructs, including operations on infinite lists, infinitary lambda-terms and p-adic numbers.

Kozen D, Silva A.  2012.  Practical Coinduction. :1-22. Abstractstructural.pdf

Induction is a well-established proof principle that is taught in most undergraduate programs in mathematics and computer science. In computer science, it is used primarily to reason about inductively-defined datatypes such as finite lists, finite trees, and the natural numbers. Coinduction is the dual principle that can be used to reason about coinductive datatypes such as infinite streams or trees, but it is not as widespread or as well understood. In this paper, we illustrate through several examples the use of coinduction in informal mathematical arguments. Our aim is to promote the principle as a useful tool for the working mathematician and to bring it to a level of familiarity on par with induction. We show that coinduction is not only about bisimilarity and equality of behaviors, but also applicable to a variety of functions and relations defined on coinductive datatypes.