Publications

Coelho F, Paulo J, Vilaça R, Pereira JO, Oliveira R.  2017.  HTAPBench: Hybrid Transactional and Analytical Processing Benchmark. Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering. :293–304. Abstract
n/a
Coelho F, Matos M, Pereira JO, Oliveira R.  2017.  Similarity Aware Shuffling for the Distributed Execution of SQL Window Functions : BPA. Distributed Applications and Interoperable Systems - 17th IFIP WG 6.1 International Conference, DAIS 2017, Held as Part of the 12th International Federated Conference on Distributed Computing Techniques, DisCoTec 2017, Neuchâtel, Switzerland, June 1. :3–18. Abstract

n/a

Maia F, Paulo J, Coelho F, Neves F, Pereira JO, Oliveira R.  2017.  DDFlasks: Deduplicated Very Large Scale Data Store. Distributed Applications and Interoperable Systems - 17th IFIP WG 6.1 International Conference, DAIS 2017, Held as Part of the 12th International Federated Conference on Distributed Computing Techniques, DisCoTec 2017, Neuchâtel, Switzerland, June 1. :51–66. Abstract

n/a

Coelho F, Pereira JO, Vilaça R, Oliveira R.  2016.  Holistic Shuffler for the Parallel Processing of SQL Window Functions. Distributed Applications and Interoperable Systems - 16th {IFIP} {WG} 6.1 International Conference, {DAIS} 2016, Held as Part of the 11th International Federated Conference on Distributed Computing Techniques, DisCoTec 2016, Heraklion, Crete, Greece, June. :75–81. Abstractholistic-proceedings.pdf

Window functions are a sub-class of analytical operators that allow data to be handled in a derived view of a given relation, while taking into account their neighboring tuples. Currently, systems bypass parallelization opportunities which become especially relevant when considering Big Data as data is naturally partitioned.
We present a shuffling technique to improve the parallel execution of window functions when data is naturally partitioned when the query holds a partitioning clause that does not match the natural partitioning of the relation. We evaluated this technique with a non-cumulative ranking function and we were able to reduce data transfer among parallel workers in 85% when compared to a naive approach.

Coelho F, Pereira JO, Vilaça R, Oliveira R.  2016.  Reducing Data Transfer in Parallel Processing of SQL Window Functions. Proceedings of the 6th International Conference on Cloud Computing and Services Science. :343-347. Abstractdatadiversityconvergence_2016_1_copy.pdf

Window functions are a sub-class of analytical operators that allow data to be handled in a derived view of a given relation, while taking into account their neighboring tuples. We propose a technique that can be used in the parallel execution of this operator when data is naturally partitioned. The proposed method benefits the cases where the required partitioning is not the natural partitioning employed. Preliminary evaluation shows that we are able to limit data transfer among parallel workers to 14\% of the registered transfer when using a naive approach.

Maia F, Matos M, Coelho F.  2016.  Towards Quantifiable Eventual Consistency. Proceedings of the 6th International Conference on Cloud Computing and Services Science. :368-370. Abstractdatadiversityconvergence_2016_5.pdf

In the pursuit of highly available systems, storage systems began offering eventually consistent data models. These models are suitable for a number of applications but not applicable for all. In this paper we discuss a system that can offer a eventually consistent data model but can also, when needed, offer a strong consistent one.

Felber P, Pasin M, Rivière E, Schiavoni V, Sutra P, Coelho F, Matos M, Oliveira R, Vilaça R.  2014.  On the Support of Versioning in Distributed Key-Value Stores. 33rd IEEE International Symposium on Reliable Distributed Systems - SRDS. Abstractpaper.pdf

The ability to access and query data stored in multiple versions is an important asset for many applications, such as Web graph analysis, collaborative editing platforms, data forensics, or correlation mining. The storage and retrieval of versioned data requires a specific API and support from the storage layer. The choice of the data structures used to maintain versioned data has a fundamental impact on the performance of insertions and queries. The appropriate data structure also depends on the nature of the versioned data and the nature of the access patterns. In this paper we study the design and implementation space for providing versioning support on top of a distributed key-value store (KVS). We define an API for versioned data access supporting multiple writers and show that a plain KVS does not offer the necessary synchronization power for implementing this API. We leverage the support for listeners at the KVS level and propose a general construction for implementing arbitrary types of data structures for storing and querying versioned data. We explore the design space of versioned data storage ranging from a flat data structure to a distributed sharded index. The resulting system, \system, is implemented on top of an industrial-grade open-source KVS, Infinispan. Our evaluation, based on real-world Wikipedia access logs, studies the performance of each versioning mechanisms in terms of load balancing, latency and storage overhead in the context of different access scenarios.

Coelho F, Cruz F, Vilaça R, Pereira JO, Oliveira R.  2014.  pH1: A Transactional Middleware for NoSQL. 33rd IEEE International Symposium on Reliable Distributed Systems - SRDS. Abstractph1.pdf

NoSQL databases opt not to offer important abstractions traditionally found in relational databases in order to achieve high levels of scalability and availability: transactional guarantees and strong data consistency.
In this work we propose pH1, a generic middleware layer over NoSQL databases that offers transactional guarantees with Snapshot Isolation. This is achieved in a non-intrusive manner,
requiring no modifications to servers and no native support for multiple versions. Instead, the transactional context is achieved by means of a multiversion distributed cache and an external
transaction certifier, exposed by extending the client’s interface with transaction bracketing primitives.
We validate and evaluate pH1 with Apache Cassandra and Hyperdex. First, using the YCSB benchmark, we show that the cost of providing ACID guarantees to these NoSQL databases
amounts to 11% decrease in throughput.
Moreover, using the transaction intensive TPC-C workload, pH1 presented an impact of 22% decrease in throughput. This contrasts with OMID, a previous proposal that takes advantage of
HBase’s support for multiple versions, with a throughput penalty of 76% in the same conditions.

Coelho F, Cruz F, Pereira JO, Vilaça R, Oliveira R.  2013.  pH1: middleware transaccional para NoSQL. INFORUM - Simpósio de Informática. Abstractph1.pdf

As bases de dados NoSQL optam por não oferecer importantes abstracções tradicionalmente encontradas nas bases de dados relacionais, de modo a atingir elevada escalabilidade e disponibilidade: garantias transacionais e critérios de coerência de dados fortes. Estas limitações resultam em maior complexidade no desenvolvimento de aplicações sobre bases de dados NoSQL e, logo, são um obstáculo à adoção do paradigma. Neste trabalho, propomos uma camada de middleware sobre bases de dados NoSQL que oferece garantias transacionais com Snapshot Isolation. A abordagem é não-intrusiva, apresentando aos clientes a mesma interface NoSQL acrescida do contexto transacional. Este contexto transacional é o cerne da nossa contribuição e assenta modularmente num repositório distribuído, não persistente, que mantém várias versões dos dados manipulados e num certificador de transaçõess concorrentes. Apresentamos uma implementação do nosso sistema pH1 sobre Cassandra e, recorrendo a um benchmark (YCSB) extensamente utilizado na avaliação de desempenho de bases de dados NoSQL, medimos o custo do suporte do paradigma transacional com garantias transacionais ACID, que não resulta numa diminuição significativa da latência das operações quando comparado com o Cassandra.

Coelho F.  2013.  Implementation and test of transactional primitives over Cassandra. Abstractthesis.pdf

NoSQL databases opt not to o er important abstractions traditionally found in relational databases in order to achieve high levels of scalability and availability: transactional guarantees and strong data consistency. These limitations bring considerable complexity to the development of client applications and are therefore an obstacle to the broader adoption of the technology. In this work we propose a middleware layer over NoSQL databases that o ers transactional guarantees with Snapshot Isolation. The proposed solution is achieved in a non-intrusive manner, providing to the clients the same interface as a NoSQL database, simply adding the transactional context. The transactional context is the focus of our contribution and is modularly based on a Non Persistent Version Store that holds several versions of elements and interacts with an external transaction certi er. In this work, we present an implementation of our system over Apache Cassandra and by using two representative benchmarks, YCSB and TPC-C, we measure the cost of adding transactional support with ACID guarantees.