Publications

Coelho F, Paulo J, Vilaça R, Pereira JO, Oliveira R.  2017.  HTAPBench: Hybrid Transactional and Analytical Processing Benchmark. Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering. :293–304. Abstract
n/a
Pontes R, Pinto M, Barbosa M, Vilaça R, Matos M, Oliveira R.  2017.  Performance trade-offs on a secure multi-party relational database. Proceedings of the Symposium on Applied Computing, {SAC} 2017, Marrakech, Morocco, April 3-7, 2017. :456–461. Abstract
n/a
Coelho F, Pereira JO, Vilaça R, Oliveira R.  2016.  Holistic Shuffler for the Parallel Processing of SQL Window Functions. Distributed Applications and Interoperable Systems - 16th {IFIP} {WG} 6.1 International Conference, {DAIS} 2016, Held as Part of the 11th International Federated Conference on Distributed Computing Techniques, DisCoTec 2016, Heraklion, Crete, Greece, June. :75–81. Abstractholistic-proceedings.pdf

Window functions are a sub-class of analytical operators that allow data to be handled in a derived view of a given relation, while taking into account their neighboring tuples. Currently, systems bypass parallelization opportunities which become especially relevant when considering Big Data as data is naturally partitioned.
We present a shuffling technique to improve the parallel execution of window functions when data is naturally partitioned when the query holds a partitioning clause that does not match the natural partitioning of the relation. We evaluated this technique with a non-cumulative ranking function and we were able to reduce data transfer among parallel workers in 85% when compared to a naive approach.

Coelho F, Pereira JO, Vilaça R, Oliveira R.  2016.  Reducing Data Transfer in Parallel Processing of SQL Window Functions. Proceedings of the 6th International Conference on Cloud Computing and Services Science. :343-347. Abstractdatadiversityconvergence_2016_1_copy.pdf

Window functions are a sub-class of analytical operators that allow data to be handled in a derived view of a given relation, while taking into account their neighboring tuples. We propose a technique that can be used in the parallel execution of this operator when data is naturally partitioned. The proposed method benefits the cases where the required partitioning is not the natural partitioning employed. Preliminary evaluation shows that we are able to limit data transfer among parallel workers to 14\% of the registered transfer when using a naive approach.

Cruz F, Maia F, Matos M, Oliveira R, Paulo J, Pereira JO, Vilaça R.  2016.  Resource Usage Prediction in Distributed Key-Value Datastores. Distributed Applications and Interoperable Systems: 16th IFIP WG 6.1 International Conference, DAIS 2016, Held as Part of the 11th International Federated Conference on Distributed Computing Techniques, DisCoTec 2016, Heraklion, Crete, Greece, June 6-9, 2. :144–159. Abstract

n/a

Pontes R, Maia F, Paulo J, Vilaça R.  2016.  SafeRegions: Performance Evaluation of Multi-party Protocols on HBase. 2016 IEEE 35th Symposium on Reliable Distributed Systems Workshops (SRDSW), 2016 . :31-36.13.pdf
Neves F, Pereira JO, Vilaça R, Oliveira R.  2015.  Otimização do HBase para dados estruturados. INForum 2015. :235-247. Abstractpscan.pdf

Os sistemas NoSQL escalam melhor que os tradicionais sistemas relacionais, motivando a migração de inúmeras aplicações para sistemas NoSQL mesmo quando não se tira partido da estrutura de dados dinâmica por eles fornecida. Porém, a consulta destes dados estruturados tem um custo adicional que deriva da flexibilidade dos sistemas NoSQL. Neste documento, analisa-se esse custo no Apache HBase e apresenta-se o Prepared Scan, uma operação adicional de consulta que visa tirar partido do conhecimento da estrutura de dados por parte da aplicação, diminuindo assim o custo associado à consulta de dados estruturados. Recorrendo à ferramenta de benchmarking YCSB, verifica-se que esta solução tem um aumento no débito de aproximadamente 29%.

Felber P, Pasin M, Rivière E, Schiavoni V, Sutra P, Coelho F, Matos M, Oliveira R, Vilaça R.  2014.  On the Support of Versioning in Distributed Key-Value Stores. 33rd IEEE International Symposium on Reliable Distributed Systems - SRDS. Abstractpaper.pdf

The ability to access and query data stored in multiple versions is an important asset for many applications, such as Web graph analysis, collaborative editing platforms, data forensics, or correlation mining. The storage and retrieval of versioned data requires a specific API and support from the storage layer. The choice of the data structures used to maintain versioned data has a fundamental impact on the performance of insertions and queries. The appropriate data structure also depends on the nature of the versioned data and the nature of the access patterns. In this paper we study the design and implementation space for providing versioning support on top of a distributed key-value store (KVS). We define an API for versioned data access supporting multiple writers and show that a plain KVS does not offer the necessary synchronization power for implementing this API. We leverage the support for listeners at the KVS level and propose a general construction for implementing arbitrary types of data structures for storing and querying versioned data. We explore the design space of versioned data storage ranging from a flat data structure to a distributed sharded index. The resulting system, \system, is implemented on top of an industrial-grade open-source KVS, Infinispan. Our evaluation, based on real-world Wikipedia access logs, studies the performance of each versioning mechanisms in terms of load balancing, latency and storage overhead in the context of different access scenarios.

Maia F, Matos M, Vilaça R, Pereira JO, Oliveira R, Rivière E.  2014.  DATAFLASKS: epidemic store for massive scale systems. The 33rd IEEE Symposium on Reliable Distributed Systems (SRDS 2014) . Abstractmain.pdf

Very large scale distributed systems provide some of the most interesting research challenges while at the same time being increasingly required by nowadays applications. The escalation in the amount of connected devices and data being produced and exchanged, demands new data management systems. Although new data stores are continuously being proposed, they are not suitable for very large scale environments. The high levels of churn and constant dynamics found in very large scale systems demand robust, proactive and unstructured approaches to data management. In this paper we propose a novel data store solely based on epidemic (or gossip-based) protocols. It leverages the capacity of these protocols to provide data persistence guarantees even in highly dynamic, massive scale systems. We provide an open source prototype of the data store and correspondent evaluation.

Coelho F, Cruz F, Vilaça R, Pereira JO, Oliveira R.  2014.  pH1: A Transactional Middleware for NoSQL. 33rd IEEE International Symposium on Reliable Distributed Systems - SRDS. Abstractph1.pdf

NoSQL databases opt not to offer important abstractions traditionally found in relational databases in order to achieve high levels of scalability and availability: transactional guarantees and strong data consistency.
In this work we propose pH1, a generic middleware layer over NoSQL databases that offers transactional guarantees with Snapshot Isolation. This is achieved in a non-intrusive manner,
requiring no modifications to servers and no native support for multiple versions. Instead, the transactional context is achieved by means of a multiversion distributed cache and an external
transaction certifier, exposed by extending the client’s interface with transaction bracketing primitives.
We validate and evaluate pH1 with Apache Cassandra and Hyperdex. First, using the YCSB benchmark, we show that the cost of providing ACID guarantees to these NoSQL databases
amounts to 11% decrease in throughput.
Moreover, using the transaction intensive TPC-C workload, pH1 presented an impact of 22% decrease in throughput. This contrasts with OMID, a previous proposal that takes advantage of
HBase’s support for multiple versions, with a throughput penalty of 76% in the same conditions.

Cruz F, Maia F, Oliveira R, Vilaça R.  2014.  Workload-aware table splitting for NoSQL. SAC - Proceedings of the ACM Symposium on Applied Computing. Abstractdads14.pdf

Massive scale data stores, which exhibit highly desirable scalability and availability properties are becoming pivotal systems in nowadays infrastructures. Scalability achieved by these data stores is anchored on data independence; there is no clear relationship between data, and atomic inter-node operations are not a concern. Such assumption over data allows aggressive data partitioning. In particular, data tables are horizontally partitioned and spread across nodes for load balancing. However, in current versions of these data stores, partitioning is either a manual process or automated but simply based on table size. We argue that size based partitioning does not lead to acceptable load balancing as it ignores data access patterns, namely data hotspots. Moreover, manual data partitioning is cumbersome and typically infeasible in large scale scenarios. In this paper we propose an automated table splitting mechanism that takes into account the system workload. We evaluate such mechanism showing that it simple, non-intrusive and e fective.

Cruz F, Maia F, Matos M, Oliveira R, Paulo J, Pereira JO, Vilaça R.  2013.  MeT: Workload aware elasticity for NoSQL. Proceedings of the 8th European Conference on Computer Systems - EuroSys. :183–196. Abstractmet.pdf

NoSQL databases manage the bulk of data produced by modern Web applications such as social networks. This stems from their ability to partition and spread data to all available nodes, allowing NoSQL systems to scale. Unfortunately, current solutions' scale out is oblivious to the underlying data access patterns, resulting in both highly skewed load across nodes and suboptimal node configurations.
In this paper, we first show that judicious placement of HBase partitions taking into account data access patterns can improve overall throughput by 35%. Next, we go beyond current state of the art elastic systems limited to uninformed replica addition and removal by: i) reconfiguring existing replicas according to access patterns and ii) adding replicas specifically configured to the expected access pattern.
MeT is a prototype for a Cloud-enabled framework that can be used alone or in conjunction with OpenStack for the automatic and heterogeneous reconfiguration of a HBase deployment. Our evaluation, conducted using the YCSB workload generator and a TPC-C workload, shows that MeT is able to i) autonomously achieve the performance of a manual configured cluster and ii) quickly reconfigure the cluster according to unpredicted workload changes.

Coelho F, Cruz F, Pereira JO, Vilaça R, Oliveira R.  2013.  pH1: middleware transaccional para NoSQL. INFORUM - Simpósio de Informática. Abstractph1.pdf

As bases de dados NoSQL optam por não oferecer importantes abstracções tradicionalmente encontradas nas bases de dados relacionais, de modo a atingir elevada escalabilidade e disponibilidade: garantias transacionais e critérios de coerência de dados fortes. Estas limitações resultam em maior complexidade no desenvolvimento de aplicações sobre bases de dados NoSQL e, logo, são um obstáculo à adoção do paradigma. Neste trabalho, propomos uma camada de middleware sobre bases de dados NoSQL que oferece garantias transacionais com Snapshot Isolation. A abordagem é não-intrusiva, apresentando aos clientes a mesma interface NoSQL acrescida do contexto transacional. Este contexto transacional é o cerne da nossa contribuição e assenta modularmente num repositório distribuído, não persistente, que mantém várias versões dos dados manipulados e num certificador de transaçõess concorrentes. Apresentamos uma implementação do nosso sistema pH1 sobre Cassandra e, recorrendo a um benchmark (YCSB) extensamente utilizado na avaliação de desempenho de bases de dados NoSQL, medimos o custo do suporte do paradigma transacional com garantias transacionais ACID, que não resulta numa diminuição significativa da latência das operações quando comparado com o Cassandra.

Maia F, Matos M, Vilaça R, Pereira JO, Oliveira R, Rivière E.  2013.  DataFlasks: an epidemic dependable key-value substrate. Proceedings of the 3rd International Workshop on Dependability of Clouds, Data Centers and Virtual Computing Environments (with DSN 2013). Abstractdataflasks_dcdv13.pdf

Recently, tuple-stores have become pivotal struc- tures in many information systems. Their ability to handle large datasets makes them important in an era with unprecedented amounts of data being produced and exchanged. However, these tuple-stores typically rely on structured peer-to-peer protocols which assume moderately stable environments. Such assumption does not always hold for very large scale systems sized in the scale of thousands of machines. In this paper we present a novel approach to the design of a tuple-store. Our approach follows a stratified design based on an unstructured substrate. We focus on this substrate and how the use of epidemic protocols allow reaching high dependability and scalability.

Vilaça R, Cruz F, Pereira JO, Oliveira R.  2013.  An Effective Scalable SQL Engine for NoSQL Databases. Proceedings of the 13th IFIP Distributed Applications and Interoperable Systems (DAIS). Abstractpaper_29.pdf

NoSQL databases were initially devised to support a few concrete extreme scale applications. Since the specificity and scale of the target systems justified the investment of manually crafting application code their limited query and indexing capabilities were not a major im- pediment. However, with a considerable number of mature alternatives now available there is an increasing willingness to use NoSQL databases in a wider and more diverse spectrum of applications and, to most of them, hand-crafted query code is not an enticing trade-off. In this paper we address this shortcoming of current NoSQL databases with an effective approach for executing SQL queries while preserving their scalability and schema flexibility. We show how a full-fledged SQL engine can be integrated atop of HBase leading to an ANSI SQL compli- ant database. Under a standard TPC-C workload our prototype scales linearly with the number of nodes in the system and outperforms a NoSQL TPC-C implementation optimized for HBase.