Recent Publications

Oliveira JN.  2014.  Relational Algebra for Just Good Enough Hardware. RAMiCS - 14th International Conference on Relational and Algebraic Methods in Computer Science. 8428:119-138. Abstractramics14.pdf

Device miniaturization is pointing towards tolerating imperfect hardware provided it is “good enough”. Software design theories will have to face the impact of such a trend sooner or later.
A school of thought in software design is relational: it expresses specifications as relations and derives programs from specifications using relational algebra.
This paper proposes that linear algebra be adopted as an evolution of relational algebra able to cope with the quantification of the impact of imperfect hardware on (otherwise) reliable software.
The approach is illustrated by developing a monadic calculus for component oriented software construction with a probabilistic dimension quantifying (by linear algebra) the propagation of imperfect behaviour from lower to upper layers of software systems.

Silva A, Jacobs B.  2014.  Initial Algebras of Terms With Binding and Algebraic Structure. Categories and Types in Logic, Language, and Physics - Lecture Notes in Computer Science. 8222:211-234. Abstract

One of the many results which makes Joachim Lambek famous is: an initial algebra of an endofunctor is an isomorphism. This fixed point result is often referred to as ``Lambek's Lemma''. In this paper, we illustrate the power of initiality by exploiting it in categories of algebra-valued presheaves $\EM(T)^{\N}$, for a monad $T$ on $\Sets$.
The use of presheaves to obtain certain calculi of expressions (with variable binding) was introduced by Fiore, Plotkin, and Turi. They used set-valued presheaves, whereas here the presheaves take values in a category $\EM(T)$ of Eilenberg-Moore algebras. This generalisation allows us to develop a theory where more structured calculi can be obtained. The use of algebras means also that we work in a linear context and need a separate operation $\bang$ for replication, for instance to describe strength for an endofunctor on $\EM(T)$. We apply the resulting theory to give systematic descriptions of non-trivial calculi: we introduce non-deterministic and weighted lambda terms and expressions for automata as initial algebras, and we formalise relevant equations diagrammatically.

Silva A, Jacobs B.  2014.  Automata Learning: A Categorical Perspective. Horizons of the Mind. A Tribute to Prakash Panangaden - Lecture Notes in Computer Science. 8464 Abstract

Automata learning is a known technique to infer a finite state machine from a set of observations. In this paper, we revisit Angluin’s original algorithm from a categorical perspective. This abstract view on the main ingredients of the algorithm lays a uniform framework to derive algorithms for other types of automata. We show a straightforward generalization to Moore and Mealy machines, which yields an algorithm already know in the literature, and we discuss generalizations to other types of automata, including weighted automata.

Hofer B, Riboira A, Wotawa F, Abreu R, Getzner E.  2013.  On the empirical evaluation of fault localization techniques for spreadsheets. Fundamental Approaches to Software Engineering. :68–82. Abstract

Spreadsheets are by far the most prominent example of end-user programs of ample size and substantial structural complexity. In addition, spreadsheets are usually not tested very rigorously and thus comprise faults. Locating faults is a hard task due to the size and the structure, which is usually not directly visible to the user, i.e., the functions are hidden behind the cells and only the computed values are presented. Hence, there is a strong need for debugging support. In this paper, we adapt three program-debugging approaches that have been designed for more traditional procedural or object-oriented programming languages. These techniques are Spectrum-based Fault Localization, Spectrum-Enhanced Dynamic Slicing, and Constraint-based Debugging. Beside the theoretical foundations, we present a more sophisticated empirical evaluation including a comparison of these approaches. The empirical evaluation shows that Sfl (Spectrum-based Fault Localization) and Sendys (Spectrum ENhanced Dynamic Slicing) are the most promising techniques.

Gonzalez-Sanchez A, Piel É, Abreu R, Gross H-G, Van Gemund AJC.  2013.  Prioritizing Tests for Fault Localization. Situation Awareness with Systems of Systems. :247–257. Abstract

In practically all development processes, regression tests are used to detect the presence of faults after a modification. If faults are detected, a fault localization algorithm can be used to reduce the manual inspection cost. However, while using test case prioritization to enhance the rate of fault detection of the test suite (e.g., statement coverage), the diagnostic information gain per test is not optimal, which results in needless inspection cost during diagnosis. We present RAPTOR, a test prioritization algorithm for fault localization, based on reducing the similarity between statement execution patterns as the testing progresses. Unlike previous diagnostic prioritization algorithms, RAPTOR does not require false negative information, and is much less complex. Experimental results from the Software Infrastructure Repository's benchmarks show that RAPTOR is the best technique under realistic conditions, with average cost reductions of 40% with respect to the next best technique, with negligible impact on fault detection capability.

Cardoso N, Abreu R.  2013.  A Distributed Approach to Diagnosis Candidate Generation. Progress in Artificial Intelligence. :175–186. Abstract

Generating diagnosis candidates for a set of failing transactions is an important challenge in the context of automatic fault localization of both software and hardware systems. Being an NP-Hard problem, exhaustive algorithms are usually prohibitive for real-world, often large, problems. In practice, the usage of heuristic-based approaches trade-off completeness for time efficiency. An example of such heuristic approaches is Staccato, which was proposed in the context of reasoning-based fault localization. In this paper, we propose an efficient distributed algorithm, dubbed MHS2, that renders the sequential search algorithm Staccato suitable to distributed, Map-Reduce environments. The results show that MHS2 scales to larger systems (when compared to Staccato), while entailing either marginal or small runtime overhead.

Cardoso N, Abreu R.  2013.  MHS2: A Map-Reduce Heuristic-Driven Minimal Hitting Set Search Algorithm. Multicore Software Engineering, Performance, and Tools. :25–36. Abstract

Computing minimal hitting sets (also known as set covers) for a collection of sets is an important problem in many domains (e.g., model/reasoning-based fault diagnosis). Being an NP-Hard problem, exhaustive algorithms are usually prohibitive for real-world, often large, problems. In practice, the usage of heuristic based approaches trade-off completeness for time efficiency. An example of such heuristic approaches is Staccato, which was proposed in the context of reasoning-based fault localization. In this paper, we propose an efficient distributed algorithm, dubbed MHS2, that renders the sequential search algorithm Staccato suitable to distributed, Map-Reduce environments. The results show that MHS2 scales to larger systems (when compared to Staccato), while entailing either marginal or small runtime overhead.

Koca F, Sözer H, Abreu R.  2013.  Spectrum-Based Fault Localization for Diagnosing Concurrency Faults. Testing Software and Systems. :239–254. Abstract

Concurrency faults are activated by specific thread interleavings at runtime. Traditional fault localization techniques and static analysis fall short to diagnose these faults efficiently. Existing dynamic fault-localization techniques focus on pinpointing data-access patterns that are subject to concurrency faults. In this paper, we propose a spectrum-based fault localization technique for localizing faulty code blocks instead. We systematically instrument the program to create versions that run in particular combinations of thread interleavings. We run tests on all these versions and utilize spectrum-based fault localization to correlate detected errors with concurrently executing code blocks. We have implemented a tool and applied our approach on several industrial case studies. Case studies show that our approach can effectively and efficiently localize concurrency faults.

Martens C, Bosser A-G, Ferreira JF, Cavazza M.  2013.  Linear Logic Programming for Narrative Generation. Logic Programming and Nonmonotonic Reasoning (LNCS 8148). Abstract

In this paper, we explore the use of Linear Logic programming for story generation. We use the language Celf to represent narrative knowledge, and its own querying mechanism to generate story instances, through a number of proof terms. Each proof term obtained is used, through a resource-flow analysis, to build a directed graph where nodes are narrative actions and edges represent inferred causality relationships. Such graphs represent narrative plots structured by narrative causality. This approach is a candidate technique for narrative generation which unifies declarative representations and generation via query and deduction mechanisms.

Campos JC, Saraiva JA, Silva CE, Silva JC.  2012.  GUIsurfer: A Reverse Engineering Framework for User Interface Software. Reverse Engineering - Recent Advances and Applications. :31-54. Abstract

In the context of developing tool support to the automated analysis of interactive systems implementations, this chapter proposal aims to investigate the applicability of reverse engineering approaches to the derivation of user interfaces behavioural models. The ultimate goal is that these models might be used to reason about the quality of the system, both from an usability and an implementation perspective, as well as being used to help systems maintenance, evolution and redesign.

Santos J, Abreu R.  2012.  Lightweight Automatic Error Detection by Monitoring Collar Variables. Testing Software and Systems. :215–230. Abstract

Although proven to be an e ective way for detecting errors, generic program invariants (also known as fault screeners) entail a considerable runtime overhead, rendering them not useful in practice. This paper studies the impact of using simple variable patterns to detect the so-called system's collar variables to reduce the number of variables to be monitored (instrumented). Two di erent patterns were investigated to determine which variables to monitor. The rst pattern nds variables whose value increase or decrease at regular intervals and deems them not important to monitor. The other pattern veri es the range of a variable per (successful) execution. If the range is constant across executions, then the variable is not monitored. Experiments were conducted on three different real-world applications to evaluate the reduction achieved on the number of variables monitored and determine the quality of the error detection. Results show a reduction of 52.04% on average in the number of monitored variables, while still maintaining a good detection rate with only 3.21% of executions detecting non-existing errors (false positives) and 5.26% not detecting an existing error (false negatives).

Andrés C, Abreu R, Núñez A.  2012.  OCE: an online collaborative editor. Computational Collective Intelligence. Technologies and Applications. :89–98. Abstract

In this paper we present the development of an Online Collaborative Editor (OCE) software system. It allows several people, to edit and share computer files using different devices, such as mobiles, PDAs in an easy way.
We use formal methods in order to automatize and describe OCE. Its formalism is very suitable to specify time requirements (both time consumption due to the performance of tasks and timeouts) as well as to represent data communication among different components of the system.
This exercise convinced us that a formal approach to develop complex systems can facilitate some of the development phases. In particular, the testing and debugging phases, more precisely, how to chose those tests more suitable to be applied, is simplified since tests are automatically extracted from the specification.

Jin D, Liu D, Yang B, Moreno CB, He D.  2011.  Ant colony optimization with Markov random walk for community detection in graphs. Advances in Knowledge Discovery and Data Mining. 6635:123–134. Abstract

Network clustering problem (NCP) is the problem associated to the detection of network community structures. Building on Markov random walks we address this problem with a new ant colony optimization strategy, named as ACOMRW, which improves prior results on the NCP problem and does not require knowledge of the number of communities present on a given network. The framework of ant colony optimization is taken as the basic framework in the ACOMRW algorithm. At each iteration, a Markov random walk model is taken as heuristic rule; all of the ants’ local solutions are aggregated to a global one through clustering ensemble, which then will be used to update a pheromone matrix. The strategy relies on the progressive strengthening of within-community links and the weakening of between-community links. Gradually this converges to a solution where the underlying community structure of the complex network will become clearly visible. The performance of algorithm ACOMRW was tested on a set of benchmark computer-generated networks, and as well on real-world network data sets. Experimental results confirm the validity and improvements met by this approach.

Moreno CB, Gonçalves N, José R.  2011.  Privacy preserving gate counting with collaborative bluetooth scanners. Workshops On the Move to Meaningful Internet Systems. 7046:534–543. Abstract

Due to its pervasiveness and communication capabilities, Bluetooth can be used as an infrastructure for several situated interaction and massive sensing scenarios. This paper shows how Bluetooth scanning can be used in gate counting scenarios, where the main goal is to provide an accurate count for the number of unique devices sighted. To this end, we present an analysis of several stochastic counting techniques that not only provide an accurate count for the number of unique devices, but o er privacy guarantees as well.

Ferreira JF, Mendes A, Cunha A, Moreno CB, Silva P, Barbosa LS, Oliveira JN.  2011.  Logic Training through Algorithmic Problem Solving. Tools for Teaching Logic. 6680:62-69. Abstractticttl11.pdf

Although much of mathematics is algorithmic in nature, the skills needed to formulate and solve algorithmic problems do not form an integral part of mathematics education. In particular, logic, which is central to algorithm development, is rarely taught explicitly at pre university level, under the justi cation that it is implicit in mathematics and therefore does not need to be taught as an independent topic. This paper argues in the opposite direction, describing a one week workshop done at the University of Minho, in Portugal, whose goal was to introduce to high-school students calculational principles and techniques of algorithmic problem solving supported by calculational logic. The work shop resorted to recreational problems to convey the principles and to software tools, the Alloy Analyzer and Netlogo, to animate models.