Carlos Baquero

Cookies Policy

The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More

Institution
Research
Research Domains
Artificial Intelligence

Bioengineering

Communications

Computer Science and Engineering
Photonics

Power and Energy Systems

Robotics

Systems Engineering and Management
RESEARCH CENTERS
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Innovation
Innovation / Tec4

TEC4AGRO-FOOD

TEC4ENERGY

TEC4HEALTH

TEC4INDUSTRY

TEC4SEA

TECPARTNERSHIPS

Available Technologies
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Laboratories
Research Laboratories

iilab
Communication
News

Events

Media

Newsletter
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Work with us
Contacts

Home
People
Carlos Baquero

Read Full presentation

My research interests cover data management in eventual consistent settings, distributed data aggregation and causality tracking. In the last years I have collaborated with my co-authors in the development of data summary mechanisms such as Scalable Bloom Filters, causality tracking for dynamic settings with Interval Tree Clocks and Dotted Version Vectors and in predictable eventual consistency with Conflict-Free Replicated Data Types. My recent work has been applied in the Riak distributed database and in Akka distributed data, and is running in production systems serving millions of users worldwide.

Read Full presentation

About

Interest
Topics

Details

Name
Carlos Baquero
Role
Area Manager
Since
01st November 2011

Nationality
Portugal
Centre
High-Assurance Software
Contacts
+351253604440
carlos.baquero@inesctec.pt

003

Publications

View all Publications

2024

Performance and explainability of feature selection-boosted tree-based classifiers for COVID-19 detection

Authors
Rufino, J; Ramírez, JM; Aguilar, J; Baquero, C; Champati, J; Frey, D; Lillo, RE; Fernández-Anta, A;

Publication
HELIYON

Abstract
In this paper, we evaluate the performance and analyze the explainability of machine learning models boosted by feature selection in predicting COVID-19-positive cases from self-reported information. In essence, this work describes a methodology to identify COVID-19 infections that considers the large amount of information collected by the University of Maryland Global COVID-19 Trends and Impact Survey (UMD-CTIS). More precisely, this methodology performs a feature selection stage based on the recursive feature elimination (RFE) method to reduce the number of input variables without compromising detection accuracy. A tree-based supervised machine learning model is then optimized with the selected features to detect COVID-19-active cases. In contrast to previous approaches that use a limited set of selected symptoms, the proposed approach builds the detection engine considering a broad range of features including self-reported symptoms, local community information, vaccination acceptance, and isolation measures, among others. To implement the methodology, three different supervised classifiers were used: random forests (RF), light gradient boosting (LGB), and extreme gradient boosting (XGB). Based on data collected from the UMD-CTIS, we evaluated the detection performance of the methodology for four countries (Brazil, Canada, Japan, and South Africa) and two periods (2020 and 2021). The proposed approach was assessed in terms of various quality metrics: F1-score, sensitivity, specificity, precision, receiver operating characteristic (ROC), and area under the ROC curve (AUC). This work also shows the normalized daily incidence curves obtained by the proposed approach for the four countries. Finally, we perform an explainability analysis using Shapley values and feature importance to determine the relevance of each feature and the corresponding contribution for each country and each country/year.

CloseRead Abstract

2023

Using survey data to estimate the impact of the omicron variant on vaccine efficacy against COVID-19 infection

Authors
Rufino, J; Baquero, C; Frey, D; Glorioso, CA; Ortega, A; Rescic, N; Roberts, JC; Lillo, RE; Menezes, R; Champati, JP; Anta, AF;

Publication
SCIENTIFIC REPORTS

Abstract
Symptoms-based detection of SARS-CoV-2 infection is not a substitute for precise diagnostic tests but can provide insight into the likely level of infection in a given population. This study uses symptoms data collected in the Global COVID-19 Trends and Impact Surveys (UMD Global CTIS), and data on variants sequencing from GISAID. This work, conducted in January of 2022 during the emergence of the Omicron variant (subvariant BA.1), aims to improve the quality of infection detection from the available symptoms and to use the resulting estimates of infection levels to assess the changes in vaccine efficacy during a change of dominant variant; from the Delta dominant to the Omicron dominant period. Our approach produced a new symptoms-based classifier, Random Forest, that was compared to a ground-truth subset of cases with known diagnostic test status. This classifier was compared with other competing classifiers and shown to exhibit an increased performance with respect to the ground-truth data. Using the Random Forest classifier, and knowing the vaccination status of the subjects, we then proceeded to analyse the evolution of vaccine efficacy towards infection during different periods, geographies and dominant variants. In South Africa, where the first significant wave of Omicron occurred, a significant reduction of vaccine efficacy is observed from August-September 2021 to December 2021. For instance, the efficacy drops from 0.81 to 0.30 for those vaccinated with 2 doses (of Pfizer/BioNTech), and from 0.51 to 0.09 for those vaccinated with one dose (of Pfizer/BioNTech or Johnson & Johnson). We also extended the study to other countries in which Omicron has been detected, comparing the situation in October 2021 (before Omicron) with that of December 2021. While the reduction measured is smaller than in South Africa, we still found, for instance, an average drop in vaccine efficacy from 0.53 to 0.45 among those vaccinated with two doses. Moreover, we found a significant negative (Pearson) correlation of around - 0.6 between the measured prevalence of Omicron in several countries and the vaccine efficacy in those same countries. This prediction, in January of 2022, of the decreased vaccine efficacy towards Omicron is in line with the subsequent increase of Omicron infections in the first half of 2022.

CloseRead Abstract

2023

Time-limited Bloom Filter

Authors
Rodrigues, A; Shtul, A; Baquero, C; Almeida, PS;

Publication
38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023

Abstract
A Bloom Filter is a probabilistic data structure designed to check, rapidly and memory-efficiently, whether an element is present in a set. It has been vastly used in various computing areas and several variants, allowing deletions, dynamic sets and working with sliding windows, have surfaced over the years. When summarizing data streams, it becomes relevant to identify the more recent elements in the stream. However, most of the sliding window schemes consider the most recent items of a data stream without considering time as a factor. While this allows, e.g., storing the most recent 10000 elements, it does not easily translate into storing elements received in the last 60 seconds, unless the insertion rate is stable and known in advance. In this paper, we present the Time-limited Bloom Filter, a new BF-based approach that can save information of a given time period and correctly identify it as present when queried, while also being able to retire data when it becomes stale. The approach supports variable insertion rates while striving to keep a target false positive rate. We also make available a reference implementation of the data structure as a Redis module.

CloseRead Abstract

2023

Probabilistic Causal Contexts for Scalable CRDTs

Authors
Fernandes, PH; Baquero, C;

Publication
Proceedings of the 10th Workshop on Principles and Practice of Consistency for Distributed Data, PaPoC 2023, Rome, Italy, 8 May 2023

Abstract

2023

A Year Embedded in the Crypto-NFT Space

Authors
Baquero, C;

Publication
COMMUNICATIONS OF THE ACM

Abstract

Supervised
thesis

Supervised Thesis

View all Supervised Theses

2022

Design de Interface para uma Plataforma de Registo Clínico Integrado de Doentes com Fibrose Quística num Centro de Referência Nacional

Author
Maria Teresa Santos Quelhas Pinto Leite

Institution
UP-FEUP

2021

Desenho e estudo de usabilidade de uma plataforma de gestão de pacientes on-line

Author
Artur Sousa Ferreira

Institution
UP-FEUP

2021

Autonomous navigation with simultaneous localization and mapping in/outdoor

Author
Carlos Miguel da Silva de Freitas

Institution
UP-FEUP

2021

Criação de aplicações Java a partir de requisitos expressos oralmente

Author
João Nuno Gomes Rodrigues de Almeida

Institution
UM

2020

Scalable and Configurable Event Processing Engine

Author
Edgar de Lemos Passos

Institution
UP-FEUP

View all Supervised Theses

About

Details

Name

Role

Since

Nationality

Centre

Contacts

SMILES

Lightkone

DaVinci

Performance and explainability of feature selection-boosted tree-based classifiers for COVID-19 detection

Using survey data to estimate the impact of the omicron variant on vaccine efficacy against COVID-19 infection

Time-limited Bloom Filter

Probabilistic Causal Contexts for Scalable CRDTs

A Year Embedded in the Crypto-NFT Space

Design de Interface para uma Plataforma de Registo Clínico Integrado de Doentes com Fibrose Quística num Centro de Referência Nacional

Desenho e estudo de usabilidade de uma plataforma de gestão de pacientes on-line

Autonomous navigation with simultaneous localization and mapping in/outdoor

Criação de aplicações Java a partir de requisitos expressos oralmente

Scalable and Configurable Event Processing Engine