Recent Publications

Abreu R, Gonzalez-Sanchez A, Van Gemund AJC.  2011.  A diagnostic reasoning approach to defect prediction. Modern Approaches in Applied Intelligence. :416–425. Abstract

During software testing, defect prediction approaches measure current reliability status, forecasting future program failures, and provide information on how many defects need to be removed before shipping. Existing approaches often require faults to be detected and identified as a new one, before a model-based trend can be fitted. While during regression testing failures may frequently occur, it is not evident which are related to new faults. Consequently, reliability growth trending can only be performed in sync with fault identification and repair, which is often performed in between regression test cycles. In this paper we present a dynamic, reasoning approach to estimate the number of defects in the system early in the process of regression testing. Our approach, coined Dracon, is based on Bayesian fault diagnosis over abstractions of program traces (also known as program spectra). Experimental results show that Dracon systematically estimates the exact number of (injected) defects, provided sufficient tests cases are available. Furthermore, we also propose a simple, analytic performance model to assess the influence of failed test cases in the estimation. We observe that our empirical findings are in agreement with the model.

Correia A, Pereira JO, Rodrigues L, Oliveira R, Carvalho N.  2010.  Practical Database Replication. Replication: Theory and Practice. Abstract

This chapter illustrates how the concepts and algorithms described earlier in this book can be used to build practical database replication systems. This is achieved first by addressing architectural challenges on how required functionality is provided by generally available software componentes and then how different components can be efficiently integrated. A second set of practical challenges arises from experience on how performance assumptions map to actual environments and real workloads. The result is a generic architecture for replicated database management systems, focusing on the interfaces between key components, and then on how different algorithmic and practical optimization options map to real world gains. This shows how consistent database replication is achievable in the current state of the art.

Leitão J, Pereira JO, Rodrigues L.  2010.  Gossip-based broadcast. Handbook of Peer-to-Peer Networking. Abstract

Gossip, or epidemic, protocols have emerged as a powerful strategy to implement highly scalable and resilient reliable broadcast primitives on large scale peer-to-peer networks. Epidemic protocols are scalable because they distribute the load among all nodes in the system and resilient because they have an intrinsic level of redundancy that masks node and network failures. This chapter provides an introduction to gossip-based broadcast on large-scale unstructured peer-to-peer overlay networks: it surveys the main results in the field, discusses techniques to build and maintain the overlays that support efficient dissemination strategies, and provides an in-depth discussion and experimental evaluation of two concrete protocols, named HyParView and Plumtree.

Leitão J, Carvalho N, Pereira JO, Oliveira R, Rodrigues L.  2010.  On Adding Structure to Unstructured Overlay Networks. Handbook of Peer-to-Peer Networking. Abstract

Unstructured peer-to-peer overlay networks are very resilient to churn and topology changes, while requiring little maintenance cost. Therefore, they are an infrastructure to build highly scalable large-scale services in dynamic networks. Typically, the overlay topology is defined by a peer sampling service that aims at maintaining, in each process, a random partial view of peers in the system. The resulting random unstructured topology is suboptimal when a specific performance metric is considered. On the other hand, structured approaches (for instance, a spanning tree) may optimize a given target performance metric but are highly fragile. In fact, the cost for maintaining structures with strong constraints may easily become prohibitive in highly dynamic networks. This chapter discusses different techniques that aim at combining the advantages of unstructured and structured networks. Namely we focus on two distinct approaches, one based on optimizing the overlay and another based on optimizing the gossip mechanism itself.

Macedo H, Oliveira JN.  2010.  Matrices As Arrows! A Biproduct Approach to Typed Linear Algebra. 10th International Conference Mathematics of Program Construction. 6120:271–287. Abstract

Motivated by the need to formalize generation of fast running code for linear algebra applications, we show how an index-free, calculational approach to matrix algebra can be developed by regarding matrices as morphisms of a category with biproducts. This shifts the traditional view of matrices as indexed structures to a type-level perspective analogous to that of the pointfree algebra of programming.The derivation of fusion, cancellation and abide laws from the biproduct equations makes it easy to calculate algorithms implementing matrixmultiplication, the kernel operation of matrix algebra, ranging from its divide-and-conquer version to the conventional, iterative one. From errant attempts to learn how particular products and coproducts emerge from biproducts, we not only rediscovered block-wise matrix combinators but also found a way of addressing other operations calculationally such as e.g. Gaussian elimination. A strategy for addressing vectorization along the same lines is also given.

Abreu R, González A, Zoeteweij P, Gemund AJC.  2010.  Using fault screeners for software error detection. Evaluation of Novel Approaches to Software Engineering. :60–74. Abstract

Fault screeners are simple software (or hardware) constructs that detect variable value errors based on unary invariant checking. In this paper we evaluate and compare the performance of three low-cost screeners (Bloom filter, bitmask, and range screener) that can be automatically integrated within a program, and trained during the testing phase. While the Bloom filter has the capacity of retaining virtually all variable values associated with proper program execution, this property comes with a much higher false positive rate per unit of training effort, compared to the more simple range and bitmask screeners, that compresses all value information in terms of a single lower and upper bound or a bitmask, respectively. We present a novel analytic model that predicts the false positive and false negative rate for ideal (i.e., screeners that store each individual value during training) and simple (e.g., range and bitmask) screeners. We show that the model agrees with our empirical findings. Furthermore, we describe an application of all screeners, where the screener’s error detection output is used as input to a fault localization process that provides automatic feedback on the location of residual program defects during deployment in the field.

Backhouse R, Chen W, Ferreira JF.  2010.  The Algorithmics of Solitaire-Like Games. Mathematics of Program Construction (LNCS 6120). Abstract

Puzzles and games have been used for centuries to nurture problem-solving skills. Although often presented as isolated brain-teasers, the desire to know how to win makes games ideal examples for teaching algorithmic problem solving. With this in mind, this paper explores one-person solitaire-like games.
The key to understanding solutions to solitaire-like games is the identification of invariant properties of polynomial arithmetic. We demonstrate this via three case studies: solitaire itself, tiling problems and a collection of novel one-person games. The known classification of states of the game of (peg) solitaire into 16 equivalence classes is used to introduce the relevance of polynomial arithmetic. Then we give a novel algebraic formulation of the solution to a class of tiling problems. Finally, we introduce an infinite class of challenging one-person games inspired by earlier work by Chen and Backhouse on the relation between cyclotomic polynomials and generalisations of the seven-trees-in-one type isomorphism. We show how to derive algorithms to solve these games.

Ferreira JF.  2010.  Designing an Algorithmic Proof of the Two-Squares Theorem. Mathematics of Program Construction (LNCS 6120). Abstract

We show a new and constructive proof of the two-squares theorem, based on a somewhat unusual, but very effective, way of rewriting the so-called extended Euclid’s algorithm. Rather than simply verifying the result—as it is usually done in the mathematical community—we use Euclid’s algorithm as an interface to investigate which numbers can be written as sums of two positive squares. The precise formulation of the problem as an algorithmic problem is the key, since it allows us to use algorithmic techniques and to avoid guessing. The notion of invariance, in particular, plays a central role in our development: it is used initially to observe that Euclid’s algorithm can actually be used to represent a given number as a sum of two positive squares, and then it is used throughout the argument to prove other relevant properties. We also show how the use of program inversion techniques can make mathematical arguments more precise.

Sousa P, Preguiça N, Moreno CB.  2009.  Forby: providing groupware features relying on distributed file system event dissemination. Groupware: Design, Implementation, and Use. 5784(5784):158–173. Abstract

Abstract. Intensive research and development has been conducted in the design and creation of groupware systems for distributed users. While for some activities, these groupware tools are widely used, for other activities the impact in the groupware community has been smaller and can be improved. One reason for this fact is that the mostly common used applications do not support collaborative features and users are reluctant to change to a different application. In this paper we discuss how available file system mechanisms can help to address this problem. In this context, we present Forby, a system that allows to provide groupware features to distributed users by combining filesystem monitoring and distributed event dissemination. To demonstrate our solution, we present three systems that rely on Forby for providing groupware features to users running unmodified applications.

Oliveira JN.  2009.  Extended Static Checking by Calculation using the Pointfree Transform. LerNet ALFA Summer School. 5520:195–251. Abstract

The pointfree transform offers to the predicate calculus what the Laplace transform offers to the differential/integral calculus: the possibility of changing the underlying mathematical space so as to enable agile algebraic calculation. This paper addresses the foundations of the transform and its application to a calculational approach to extended static checking (ESC) in the context of abstract modeling. In particular, a calculus is given whose rules help in breaking the complexity of the proof obligations involved in static checking arguments. The close connection between such calculus and that of weakest pre-conditions makes it possible to use the latter in ESC proof obligation discharge, where pointfree notation is again used, this time to calculate with invariant properties to be maintained. A connection with the “everything is a relation” lemma of Alloy is established, showing how close to each other the pointfree and Alloy notations are. The main advantage of this connection is that of complementing pen-and-paper pointfree calculations with model checking support wherever validating sizable abstract models.

Carvalho N, Oliveira JP, Pereira JO.  2009.  Evaluating Throughput Stability of Protocols for Distributed Middleware. On the Move to Meaningful Internet Systems: OTM 2009. :600–613. Abstract

Communication of large data volumes is a core functionality of distributed systems middleware, namely, for interconnecting components, for distributed computation and for fault tolerance. This common functionality is how- ever achieved in different middleware platforms with various combinations of operating system and application level protocols, both standardized and ad hoc, and including implementations on managed runtime environments such as Java.
In this paper, in contrast with most previous work that focus on performance, we point out that architectural and implementation decisions have an impact in throughput stability when the system is heavily loaded, precisely when such sta- bility is most important. In detail, we present an experimental evaluation of several communication protocol components under stress conditions and conclude on the relative merits of several architectural options.

Meng S, Barbosa LS.  2009.  A Coalgebraic Semantic Framework for Reasoning about Interaction Designs. UML Semantics and Applications. :249–279. Abstract

If, as a well-known aphorism states, modelling is for reasoning, this chapter is an attempt to define and apply a formal semantics to interaction patterns captured by UML 2.0 sequence diagrams in order to enable rigourous reasoning about them. Actually, model transformation plays a fundamental role in the process of software development, in general, and in model driven engineering in particular. Being a de facto standard in this area, UML is no exception, even if the number and diversity of diagrams expressing UML models makes it difficult to base its semantics on a single framework. This chapter argues for the use of coalgebra theory, as the mathematics of state-based designs, to give a precise semantics to sequence diagrams with coercions. In particular, it illustrates how an algebra for constructing and reasoning about them can be made explicit and useful for the working software architect.

Abreu R, Zoeteweij P, Van Gemund AJC.  2009.  A Model-Based Software Reasoning Approach to Software Debugging. Opportunities and Challenges for Next-Generation Applied Intelligence. :233–239. Abstract

Current model-based approaches to software debugging use static program analysis to derive a model of the program. In contrast, in the software engineering domain diagnosis approaches are based on analyzing dynamic execution behavior. We present a model-based approach where the program model is derived from dynamic execution behavior, and evaluate its diagnostic performance on the Siemens software benchmark, extended by us to accommodate multiple faults. We show that our approach outperforms other model-based software debugging techniques, which is partly due to the use of De Kleer’s intermittency model to account for the variability of software component behavior.

Ferreira JF, Mendes A, Backhouse R, Barbosa LS.  2009.  Which Mathematics for the Information Society? Teaching Formal Methods. Abstract

MathIS is a new project that aims to reinvigorate secondary-school mathematics by exploiting insights of the dynamics of algorithmic problem solving. This paper describes the main ideas that underpin the project. In summary, we propose a central role for formal logic, the development of a calculational style of reasoning, the emphasis on the algorithmic nature of mathematics, and the promotion of self-discovery by the students. These ideas are discussed and the case is made, through a number of examples that show the teaching style that we want to introduce, for their relevance in shaping mathematics training for the years to come. In our opinion, the education of software engineers that work effectively with formal methods and mathematical abstractions should start before university and would benefit from the ideas discussed here.

Parisaca Vargas A, Garis A, Tarifa SLT, George C.  2009.  Model Checking LTL Formulae in RAISE with FDR. Integrated Formal Methods. 5423:231-245. Abstractifm09.pdf

The Raise Specification Language (RSL) is a modeling language which supports various specification styles. To apply model checking to RSL concurrent descriptions, we translate RSL specifications into the input language CSPM of FDR. FDR is the model checker for the process algebra CSP. First, we define a syntactic and semantic translation from the concurrent applicative subset of RSL to CSPM, and show that this translation is a strong bisimulation which preserves properties such as traces and deadlock. Consequently, results obtained by refinement checks in FDR are sound for the original RSL descriptions. Second, RSL uses Linear Temporal Logic (LTL) to specify desired properties, but FDR does not support LTL. LTL formulas may be translated to CSP test processes in order to check them with FDR. We build a tool which automates the translation of RSL specifications into CSPM and translates LTL formulas to CSP processes, enabling the model checking of LTL formulas over RSL descriptions with FDR.