Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

I am a Lecturer at the Department of Informatics at the University of Minho. I am also a researcher at HASLab/INESC TEC. My research interest focus mainly on machine learning and data mining. Occasionally, I participate in Bioinformatics research projects involving analysis of molecular dynamic simulations of protein folding/unfolding.

 I hold a PhD in Computing from Imperial College (University of London) where I did research in logic programming. I have been working on the development of association rules mining algorithms and novel patterns to capture distribution learning. I also have interest in social network analysis, graph mining, subgroup mining and motif discovery in time series.

Interest
Topics
Details

Details

  • Name

    Paulo Jorge Azevedo
  • Role

    Senior Researcher
  • Since

    01st November 2011
003
Publications

2023

Subgroup mining for performance analysis of regression models

Authors
Pimentel, J; Azevedo, PJ; Torgo, L;

Publication
EXPERT SYSTEMS

Abstract
Machine learning algorithms have shown several advantages compared to humans, namely in terms of the scale of data that can be analysed, delivering high speed and precision. However, it is not always possible to understand how algorithms work. As a result of the complexity of some algorithms, users started to feel the need to ask for explanations, boosting the relevance of Explainable Artificial Intelligence. This field aims to explain and interpret models with the use of specific analytical methods that usually analyse how their predicted values and/or errors behave. While prediction analysis is widely studied, performance analysis has limitations for regression models. This paper proposes a rule-based approach, Error Distribution Rules (EDRs), to uncover atypical error regions, while considering multivariate feature interactions without size restrictions. Extracting EDRs is a form of subgroup mining. EDRs are model agnostic and a drill-down technique to evaluate regression models, which consider multivariate interactions between predictors. EDRs uncover regions of the input space with deviating performance providing an interpretable description of these regions. They can be regarded as a complementary tool to the standard reporting of the expected average predictive performance. Moreover, by providing interpretable descriptions of these specific regions, EDRs allow end users to understand the dangers of using regression tools for some specific cases that fall on these regions, that is, they improve the accountability of models. The performance of several models from different problems was studied, showing that our proposal allows the analysis of many situations and direct model comparison. In order to facilitate the examination of rules, two visualization tools based on boxplots and density plots were implemented. A network visualization tool is also provided to rapidly check interactions of every feature condition. An additional tool is provided by using a grid of boxplots, where comparison between quartiles of every distribution with a reference is performed. Based on this comparison, an extrapolation of counterfactual examples to regression was also implemented. A set of examples is described, including a setting where regression models performance is compared in detail using EDRs. Specifically, the error difference between two models in a dataset is studied by deriving rules highlighting regions of the input space where model performance difference is unexpected. The application of visual tools is illustrated using EDRs examples derived from public available datasets. Also, case studies illustrating the specialization of subgroups, identification of counter factual subgroups and detecting unanticipated complex models are presented. This paper extends the state of the art by providing a method to derive explanations for model performance instead of explanations for model predictions.

2023

Social network analytics and visualization: Dynamic topic-based influence analysis in evolving micro-blogs

Authors
Tabassum, S; Gama, J; Azevedo, PJ; Cordeiro, M; Martins, C; Martins, A;

Publication
EXPERT SYSTEMS

Abstract
Influence Analysis is one of the well-known areas of Social Network Analysis. However, discovering influencers from micro-blog networks based on topics has gained recent popularity due to its specificity. Besides, these data networks are massive, continuous and evolving. Therefore, to address the above challenges we propose a dynamic framework for topic modelling and identifying influencers in the same process. It incorporates dynamic sampling, community detection and network statistics over graph data stream from a social media activity management application. Further, we compare the graph measures against each other empirically and observe that there is no evidence of correlation between the sets of users having large number of friends and the users whose posts achieve high acceptance (i.e., highly liked, commented and shared posts). Therefore, we propose a novel approach that incorporates a user's reachability and also acceptability by other users. Consequently, we improve on graph metrics by including a dynamic acceptance score (integrating content quality with network structure) for ranking influencers in micro-blogs. Additionally, we analysed the topic clusters' structure and quality with empirical experiments and visualization.

2020

Sequence Mining for Automatic Generation of Software Tests from GUI Event Traces

Authors
Oliveira, A; Freitas, R; Jorge, A; Amorim, V; Moniz, N; Paiva, ACR; Azevedo, PJ;

Publication
Intelligent Data Engineering and Automated Learning - IDEAL 2020 - 21st International Conference, Guimaraes, Portugal, November 4-6, 2020, Proceedings, Part II

Abstract
In today’s software industry, systems are constantly changing. To maintain their quality and to prevent failures at controlled costs is a challenge. One way to foster quality is through thorough and systematic testing. Therefore, the definition of adequate tests is crucial for saving time, cost and effort. This paper presents a framework that generates software test cases automatically based on user interaction data. We propose a data-driven software test generation solution that combines the use of frequent sequence mining and Markov chain modeling. We assess the quality of the generated test cases by empirically evaluating their coverage with respect to observed user interactions and code. We also measure the plausibility of the distribution of the events in the generated test sets using the Kullback-Leibler divergence. © 2020, Springer Nature Switzerland AG.

2019

Preference rules for label ranking: Mining patterns in multi-target relations

Authors
de Sá, CR; Azevedo, PJ; Soares, C; Jorge, AM; Knobbe, AJ;

Publication
CoRR

Abstract

2018

Preference rules for label ranking: Mining patterns in multi-target relations

Authors
de Sa, CR; Azevedo, P; Soares, C; Jorge, AM; Knobbe, A;

Publication
INFORMATION FUSION

Abstract
In this paper, we investigate two variants of association rules for preference data, Label Ranking Association Rules and Pairwise Association Rules. Label Ranking Association Rules (LRAR) are the equivalent of Class Association Rules (CAR) for the Label Ranking task. In CAR, the consequent is a single class, to which the example is expected to belong to. In LRAR, the consequent is a ranking of the labels. The generation of LRAR requires special support and confidence measures to assess the similarity of rankings. In this work, we carry out a sensitivity analysis of these similarity-based measures. We want to understand which datasets benefit more from such measures and which parameters have more influence in the accuracy of the model. Furthermore, we propose an alternative type of rules, the Pairwise Association Rules (PAR), which are defined as association rules with a set of pairwise preferences in the consequent. While PAR can be used both as descriptive and predictive models, they are essentially descriptive models. Experimental results show the potential of both approaches.

Supervised
thesis

2022

Formalization of Deep Learning Techniques with the Why3 Proof Platform

Author
Márcio Alexandre Mota Sousa

Institution
UM

2021

Interpretabilidade em Aprendizagem Máquina num Contexto de Modelos de Regressão Caixa Negra

Author
João Pedro Torres Pimentel

Institution
UM

2020

Active Learning for fraud Detection

Author
Miguel Lobo Pinto Leite

Institution
UM

2019

Active Learning and Intelligent Queues

Author
Miguel Lobo Pinto Leite

Institution
UM