Recent Publications

Pereira V.  2016.  A deductive verification tool for cryptographic software. Master Abstractvitor_pereira_dissertation.pdf

Security is notoriously diffcult to sell as a feature in software products. In addition to meeting a set
of security requirements, cryptographic software has to be cheap, fast, and use little resources. The
development of cryptographic software is an area with specific needs in terms of software development
processes and tools. In this thesis we explore how formal techniques, namely deductive verification
techniques, can be used to increase the guarantees that cryptographic software implementations indeed
work as prescribed.
CAO (C and OCCAM) is a programming language specific to the domain of Cryptography. Con-
trol structures are similar to C, but it incorporates data types that deal directly with the needs of a
programmer when translating specifications of cryptographic schemes (eg, from scientific papers or
standards) to the real world. CAO language is supported by a compiler and an interpreter developed
by HASLab, in a sequence of research and development projects.
The CAOVerif tool was designed to allow deductive verification programs written in CAO. This
tool follows the same paradigm as other tools available for high level programming languages, such
as Frama-C, according to which a CAO program annotated with a specification is converted in an
input program to the Jessie/Why3 tool-chain, where the specified properties are then analysed.
After the development of CAOVerif, a new tool, specific to the domain of Cryptography - named
EasyCrypt - was developed. The objective of this project is to evaluate EasyCrypt as a potential
backend for the CAOVerif tool, through the development of a prototype that demonstrates the advan-
tages and disadvantages of this solution.

Oliveira N.  2015.  Architectural Reconfiguration of Interacting Services. Abstractthesis_full_annex3.pdf

The exponential growth of information technology users and the rising of their expectations imposed a paradigmatic change in the way software systems are developed. From monolithic to modular, from centralised to distributed, from static to dynamic. Software systems are nowadays regarded as coordinated compositions of several computational blocks, distributed by different execution nodes, within flexible and dynamic architectures.
They are not flawless, though. Moreover, execution nodes may fail, new requirements may become necessary, or the deployment environment may evolve in such a way that measures of quality of service of the system become degraded. Reconfiguring, repairing, adapting, preferably in a dynamic way, became, thus, relevant issues for the software architect.
But, developing such systems right is still a challenge. In particular, current (formal) methods for characterising and analysing contextual changes and reconfiguration strategies fall behind the engineering needs.
This thesis formalises a framework, referred to as aris, for modelling and analysing architectural reconfigurations. The focus is set on the coordination layer, understood in the context of the Reo model, as it plays the key role in defining the behaviour of compositional systems. Therefore, it proposes a notion of a Coordination Pattern, as a graph-based model of the coordination layer; and of Reconfiguration Patterns, as parametric operations inducing topological changes in coordination patterns.
Properties of reconfigurations can be stated and evaluated from two different perspectives: behavioural and structural. The former compares the behavioural semantics of the reconfigured system based on whatever semantic model one associates to coordination patterns.
The latter focuses on the graph topology of the coordination pattern. Properties are expressed in a propositional hybrid logic, referring to the actual connectivity expressed in that graph.
To bring quality of service into the picture, the thesis also contributes with a new semantic model for stochastic Reo, based on interactive Markov chains. This opens new possibilities for analysis of both coordination patterns and reconfigurations. In particular for inspecting the effects of reconfigurations in the system’s quality of service, or determining reconfiguration triggers, based on the variations of the latter.
Another contribution of the thesis is the integration of aris in a monitoring strategy that enables self-adaptation and attempts to deliver it as a service in a cloud environment.
Tools are delivered to support aris. In particular, language-based technology to encode, transform and analyse coordination and reconfiguration patterns, materialises it in a dedicated editor.
All the above mentioned contributions are assessed through a case study where a static
system is worked out to support self-adaptation.

Gonçalves RC.  2015.  Parallel Programming by Transformation. Abstractgoncalves-2015.pdf

The development of efficient software requires the selection of algorithms and optimizations tailored for each target hardware platform. Alternatively, performance portability may be obtained through the use of optimized libraries. However, currently all the invaluable knowledge used to build optimized libraries is lost during the development process, limiting its reuse by other developers when implementing new operations or porting the software to a new hardware platform. To answer these challenges, we propose a model-driven approach and framework to encode and systematize the domain knowledge used by experts when building optimized libraries and program implementations. This knowledge is encoded by relating the domain operations with their implementations, capturing the fundamental equivalences of the domain, and defining how programs can be transformed by refinement (adding more implementation details), optimization (removing inefficiencies), and extension (adding features). These transformations enable the incremental derivation of efficient and correct by construction program implementations from abstract program specifications. Additionally, we designed an interpretations mechanism to associate different kinds of behavior to domain knowledge, allowing developers to animate programs and predict their properties (such as performance costs) during their derivation. We developed a tool to support the proposed framework, ReFlO, which we use to illustrate how knowledge is encoded and used to incrementally—and mechanically—derive efficient parallel program implementations in different application domains. The proposed approach is an important step to make the process of developing optimized software more systematic, and therefore more understandable and reusable. The knowledge systematization is also the first step to enable the automation of the development process.

Paulo J.  2015.  Dependable Decentralized Storage Management for Cloud Computing. Abstractpp15.pdf

The volume of worldwide digital information is growing and will continue to grow at an impressive rate. Storage deduplication is accepted as valuable technique for handling such data explosion. Namely, by eliminating unnecessary duplicate content from storage systems, both hardware and storage management costs can be improved. Nowadays, this technique is applied to distinct storage types and, it is increasingly desired in cloud computing infrastructures, where a significant portion of worldwide data is stored. However, designing a deduplication system for cloud infrastructures is a complex task, as duplicates must be found and eliminated across a distributed cluster that supports virtual machines and applications with strict storage performance requirements. The core of this dissertation addresses precisely the challenges of cloud infrastructures deduplication. We start by surveying and comparing the existing deduplication systems and the distinct storage environments targeted by them. This discussion is missing in the literature and it is important for understanding the novel issues that must be addressed by cloud deduplication systems. Then, as our main contribution, we introduce our own deduplication system that eliminates duplicates across virtual machine volumes in a distributed cloud infrastructure. Redundant content is found and removed in a cluster-wide fashion while having a negligible impact in the performance of applications using the deduplicated volumes.
Our prototype is evaluated in a real distributed setting with a benchmark suited for deduplication systems, which is also a contribution of this dissertation.

Maia F.  2015.  Epidemic Store for Massive Scale Systems. Abstractmain.pdf

Considering the state-of-the-art systems for data management, it is observable that they exhibit two main frailties when deployed in a large scale system. On one hand, coordination protocols used in traditional relational database management systems do not perform well when the system grows beyond tens of nodes. On the other hand, data management approaches that relax consistency guarantees, thus avoiding coordination, struggle with high levels of system churn. In this dissertation, we present a completely decentralized, coordinationfree, scalable and robust data store. Our design is aimed at environments with several thousands of nodes and high levels of churn. Offering the current ubiquitous key-value data structures and programming interfaces, we describe how to overcome challenges raised by the need to distribute data -essential for load balancing, to replicate data - the crux of fault tolerance, and to route requests - key to performability. Alongside the design of our data store, we make several contributions in the context of distributed systems slicing. We propose a novel slicing protocol that overcomes state-of-the-art limitations. Additionally, we propose a novel epidemic algorithm for scalable and decentralized organization of system nodes into groups. This algorithm is used as an alternative to slicing at the core of our system. It organizes nodes into groups of parameterizable size without the need to have nodes knowing the system size. The contributions made on slicing protocols and the proposed group construction protocol are independent from the design of the data store. They are generic and can also be used as building blocks for other applications.

Neves F, Pereira JO, Vilaça R.  2015.  Análise de Desempenho e Otimização do Apache HBase para Dados Relacionais. Abstractmain.pdf

A popularidade que os sistemas NoSQL têm vindo a conquistar leva a que sejam constantemente submetidos a análises e otimizações de desempenho. A capacidade destes sistemas capazes de escalar melhor que as tradicionais bases de dados relacionais motivou a migração de inúmeras aplicações para sistemas NoSQL mesmo quando não se tira partido da estrutura de dados flexível por eles fornecida. Porém, a consulta destes dados estruturados tem um custo adicional que deriva da flexibilidade dos sistemas NoSQL.

Este trabalho demonstra algumas limitações de desempenho do Apache HBase e propõe e avalia o Prepared Scan, uma operação que visa tirar partido do conhecimento da estrutura de dados por parte da aplicação, diminuindo assim o custo associado à consulta de dados estruturados.

Martins P.  2014.  Embedding Attribute Grammars and their Extensions using Functional Zippers. Abstractphd.pdf

Gramáticas de atributos são um formalismo que permite exprimir algoritmos complexos de análise e transformação de programas, que tipicamente requerem várias travessias as árvores abstractas que os representam. As gramáticas de atributos foram extendidas com mecanismos que permitem referências, ordem superior e circularidade em atributos. Estas extensões permitem a implementação de mecanismos complexos e modulares de computações em linguagens.
Neste trabalho embebemos gramáticas de atributos e as suas extensões de forma elegante e simples, através de uma técnica chamada ”zippers”. Na nossa técnica, especificações de linguagens são definidas com um conjunto de componentes independentes de primeira ordem, que podem ser facilmente compostos para formar poderosos ambientes de processamento de linguagens.
Também desenvolvemos técnicas que descrevem transformações bidireccionais entre gramáticas. Definimos métodos de especificar transformações que, através de mecanismos completamente automáticos são invertidas e estendidas e geram graméticas de atributos que especificam o nosso ambiente bidirecional.
Com esta tecnica foram implementados varios exemplos de especificacao e processamento de linguagens, alguns dos quais estao definidos e explicados neste documento. Da mesma forma, criamos e desenvolvemos uma linguagem de dominio especifico usando a nossa tecnica; linguagem essa que integramos num portal que permite a criação de analises de programas completamente configurada para servir os requisitos particulares de cada utilizador.

Matos M.  2013.  Epidemic Algorithms for Large Scale Data Dissemination. Abstractmiguel_angelo_marques_de_matos.pdf

Distributed systems lie at the core of modern IT infrastructures and services, such as the Internet, e-commerce, the stock exchange, Cloud Computing and the SmartGrid. These systems, built and developed throughout the last decades, have relied, due to their importance, on distributed algorithms with strong cor- rectness and safety guarantees. However, such algorithms have failed to accom- pany, for theoretical and practical reasons, the requirements of the distributed systems they support in terms of scale, scope and pervasiveness. Reality is unfor- giving and thus researchers had the need to design and develop new algorithms based on probabilistic principles that, despite their probabilistic yet quantifiable guarantees, are suitable to today’s modern distributed systems. In this dissertation, we study the challenges of and propose solutions for, ap- plying probabilistic dissemination algorithms, also known as epidemic- or gossip- based, in very large scale distributed systems. In particular, we focus on the issues of scalability of content types (topic-based publish-subscribe), content size (efficient data dissemination) and ordering requirements (total order). For each one of these issues, we present a novel distributed algorithm that solves the prob- lem while matching state-of-the art performance and trade-offs, and evaluate it on a realistic setting.

Coelho F.  2013.  Implementation and test of transactional primitives over Cassandra. Abstractthesis.pdf

NoSQL databases opt not to o er important abstractions traditionally found in relational databases in order to achieve high levels of scalability and availability: transactional guarantees and strong data consistency. These limitations bring considerable complexity to the development of client applications and are therefore an obstacle to the broader adoption of the technology. In this work we propose a middleware layer over NoSQL databases that o ers transactional guarantees with Snapshot Isolation. The proposed solution is achieved in a non-intrusive manner, providing to the clients the same interface as a NoSQL database, simply adding the transactional context. The transactional context is the focus of our contribution and is modularly based on a Non Persistent Version Store that holds several versions of elements and interacts with an external transaction certi er. In this work, we present an implementation of our system over Apache Cassandra and by using two representative benchmarks, YCSB and TPC-C, we measure the cost of adding transactional support with ACID guarantees.

Portela B.  2013.  Segurança Criptográfica no Armazenamento e Partilha de Dados em Ambientes Cloud. Abstractmsc_bernardoportela.pdf

Technological developments in the cloud model allow for an exponential growth in the online storage of information. File sharing services such as Dropbox or Google Drive already play an important role among the users, promoting the demand for information availability and data sharing. The implementation of secure systems is strictly related to the concept of trust. In practice, data sharing systems tend to assume a weak level of trust, usually considering an adversary model that excludes the provider, reducing its usefulness for users with sensible information. On the other hand, not trusting the provider in any way is a solution that contradicts the model itself, as it forces the bulk of processing to be performed locally, and the remote server’s high performance remains unused.Therefore, the challenges in developing cloud security consist in adapting the trust level, balancing performance, security and functionality. This thesis proposes an analysis of the current security and its alternative solutions, as well as an implementation of a secure file sharing system. These initial objectives consist on both a theoretical and practical approach to the subject, identifying the problems, presenting alternatives and testing their viability. Finally, the implemented system is to apply the solutions previously evaluated on a practical perspective, followed by a security validation that allows for comparison with modern systems.

Jesus P.  2012.  Robust Distributed Data Aggregation. Abstractpaulo_cesar_de_oliveira_jesus.pdf

Distributed aggregation algorithms are an important building block of modern large scale systems, as it allows the determination of meaningful system-wide properties (e.g., network size, total storage capacity, average load, or majorities) which are required to direct the execution of distributed applications. In the last decade, several algorithms have been proposed to address the distributed computation of aggregation functions (e.g., COUNT, SUM, AVERAGE, and MAX/MIN), exhibiting different properties in terms of accuracy, speed and communication tradeoffs. However, existing approaches exhibit many issues when challenged in faulty and dynamic environments, lacking in terms of fault-tolerance and support to churn.
This study details a novel distributed aggregation approach, named Flow Updating, which is fault-tolerant and able to operate on dynamics networks. The algorithm is based on manipulating flows (inspired by the concept from graph theory), that are updated using idempotent messages, providing it with unique robustness capabilities. Experimental results showed that Flow Updating outperforms previous averaging algorithms in terms of time and message complexity, and unlike them it self adapts to churn and changes of the initial input values without requiring any periodic restart, supporting node crashes and high levels of message loss.
In addition to this main contribution, others can also be found in this research work, namely: a definition of the aggregation problem is proposed; existing distributed aggregation algorithm are surveyed and classified into a comprehensive taxonomy; a novel algorithm is introduced, based on Flow Updating, to estimate the Cumulative Distribution Function (CDF) of a global system attribute.
It is expected that this work will constitute a relevant contribution to the area of distributed computing, in particular to the robust distributed computation of aggregation functions in dynamic networks.

Vilaça R.  2012.  Clouder: A Flexible Large Scale Decentralized Object Store. Abstractricardo_manuel_pereira_vilaca.pdf

Large scale data stores have been initially introduced to support a few concrete extreme scale applications such as social networks. Their scalability and availability requirements often outweigh sacrificing richer data and processing models, and even elementary data consistency. In strong contrast with traditional relational databases (RDBMS), large scale data stores present very simple data models and APIs, lacking most of the established relational data management operations; and relax consistency guarantees, providing eventual consistency. With a number of alternatives now available and mature, there is an increasing willingness to use them in a wider and more diverse spectrum of applications, by skewing the current trade-off towards the needs of common business users, and easing the migration from current RDBMS. This is particularly so when used in the context of a Cloud solution such as in a Platform as a Service (PaaS). This thesis aims at reducing the gap between traditional RDBMS and large scale data stores, by seeking mechanisms to provide additional consistency guarantees and higher level data processing primitives in large scale data stores. The devised mechanisms should not hinder the scalability and dependability of large scale data stores. Regarding, higher level data processing primitives this thesis explores two complementary approaches: by extending data stores with additional operations such as general multi-item operations; and by coupling data stores with other existent processing facilities without hindering scalability. We address this challenges with a new architecture for large scale data stores, efficient multi item access for large scale data stores, and SQL processing atop large scale data stores. The novel architecture allows to find the right trade-offs among flexible usage, efficiency, and fault-tolerance. To efficient support multi item access we extend first generation large scale data store’s data models with tags and a multi-tuple data placement strategy, that allow to efficiently store and retrieve large sets of related data at once. For efficient SQL support atop scalable data stores we devise design modifications to existing relational SQL query engines, allowing them to be distributed. We demonstrate our approaches with running prototypes and extensive experimental evaluation using proper workloads.

Shoker A.  2012.  Byzantine Fault Tolerance: From Static Selection to Dynamic Switching. AbstractPaper

Byzantine Fault Tolerance (BFT) is becoming crucial with the revolution of online applications and due to the increasing number of innovations in computer technologies. Although dozens of BFT protocols have been introduced in the previous decade, their adoption by practitioners sounds disappointing. To some extant, this indicates that existing protocols are, perhaps, not yet too convincing or satisfactory. The problem is that researchers are still trying to establish `the best protocol' using traditional methods, e.g., through designing new protocols. However, theoretical and experimental analyses demonstrate that it is hard to achieve one-size-fits-all BFT protocols. Indeed, we believe that looking for smarter tactics like `fasten fragile sticks with a rope to achieve a solid stick' is necessary to circumvent the issue. In this thesis, we introduce the first BFT selection model and algorithm that automate and simplify the election process of the `preferred' BFT protocol among a set of candidate ones. The selection mechanism operates in three modes: Static, Dynamic, and Heuristic. For the two latter modes, we present a novel BFT system, called Adapt, that reacts to any potential changes in the system conditions and switches dynamically between existing BFT protocols, i.e., seeking adaptation. The Static mode allows BFT users to choose a single BFT protocol only once. This is quite useful in Web Services and Clouds where BFT can be sold as a service (and signed in the SLA contract). This mode is basically designed for systems that do not have too fluctuating states. In this mode, an evaluation process is in charge of matching the user preferences against the profiles of the nominated BFT protocols considering both: reliability, and performance. The elected protocol is the one that achieves the highest evaluation score. The mechanism is well automated via mathematical matrices, and produces selections that are reasonable and close to reality. Some systems, however, may experience fluttering conditions, like variable contention or message payloads. In this case, the static mode will not be efficient since a chosen protocol might not fit the new conditions. The Dynamic mode solves this issue. Adapt combines a collection of BFT protocols and switches between them, thus, adapting to the changes of the underlying system state. Consequently, the `preferred' protocol is always polled for each system state. This yields an optimal quality of service, i.e., reliability and performance. Adapt monitors the system state through its \emph{Event System}, and uses a Support Vector Regression method to conduct run time predictions for the performance of the protocols (e.g., throughput, latency, etc). Adapt also operates in a Heuristic mode. Using predefined heuristics, this mode optimizes user preferences to improve the selection process. The evaluation of our approach shows that selecting the `preferred' protocol is automated and close to reality in the static mode. In the Dynamic mode, Adapt always achieves the optimal performance among available protocols. The evaluation demonstrates that the overall system performance can be improved significantly too. Other cases explore that it is not always worthy to switch between protocols. This is made possible through conducting predictions with high accuracy, that can reach more than 98% in many cases. Finally, the thesis shows that Adapt can be smarter through using heuristics.

Cunha J.  2011.  Model-based Spreadsheet Engineering. Abstractthesis.pdf

Spreadsheets can be viewed as programming languages for non-professional programmers. These so-called ``end-user'' programmers vastly outnumber professional programmers creating millions of new spreadsheets every year. As a programming language, spreadsheets lack support for abstraction, testing, encapsulation, or structured programming. As a result, and as numerous studies have shown, the high rate of production is accompanied by an alarming high rate of errors. Some studies report that up to 90% of real-world spreadsheets contain errors. After their initial creation, many spreadsheets turn out to be used for storing and processing increasing amounts of data and supporting increasing numbers of users over long periods of time, making them complicated systems.
An emerging solution to handle the complex and evolving software systems is Model-driven Engineering (MDE). To consider models as first class entities and any software artifact as a model or a model element is one of the basic principles of MDE.
We adopted some techniques from MDE to solve spreadsheet problems. Most spreadsheets (if not all) lack a proper specification or a model. Using reverse engineering techniques we are able to derive various models from legacy spreadsheets. We use functional dependencies (a formalism that allow us to define how some column values depend on other column values) as building blocks for these models. Models can be used for several spreadsheet improvements, namely refactoring, safe evolution, migration or even generation of edit assistance. The techniques presented in this work are available under the framework HAEXCEL that we developed. It is composed of online and batch tools, reusable HASKELL libraries and extensions.
A study with several end-users was organized to survey the impact of the techniques we designed. The results of this study indicate that the models can bring great benefits to spreadsheet engineering helping users to commit less errors and to work faster.