Doctoral Thesis Defense - Alexandre Campos Perez

Monday, July 23, 2018, 2:00pm

Title: Spectrum-based Diagnosis: Measurements, Improvements and Applications

Abstract: Debugging—the process of locating and fixing abnormal behavior in software—is often cited as one of the most costly and unpredictable phases of developing software. It is therefore essential to minimize the impact of debugging in software development, while still ensuring the quality of software systems. Previous research has proposed several automated fault localization techniques, that aid developers by pinpointing which software components are likely to be faulty. Among such techniques is Spectrum-based Fault Localization (SFL). SFL is a runtime technique that collects the involvement of each software component in test cases—usually called program spectrum—, and reasons about their correlation to failing test outcomes. The assumption is that components frequently involved in failing tests are more likely to be faulty and, conversely, components more frequently involved in passing tests are less likely to be the root cause. The abstract nature of program spectra allows for a language-agnostic, lightweight analysis (compared to related approaches), with considerable accuracy. However, there are limited accounts of successful transitions of SFL into practice. In fact, studies show that developers quickly discard diagnostic reports after inspecting a small, limited number of fault candidates reported by the technique. This thesis proposes several approaches aimed at enhancing the usefulness of SFL in practice. Namely, we introduce predictive measurements of diagnostic performance, improve fault comprehension and fault isolation, and apply SFL theory to the context of feature detection for program maintenance. First, we propose a runtime measurement, named DDU, aimed at assessing the effectiveness of a test suite at diagnosing potential faults in the code. DDU measures three traits found in highly-diagnosable spectra, namely: moderate component involvement density, high diversity of test cases, and high component involvement unambiguity. Through these traits, DDU ensures that distinct combinations of components are exercised in tandem to maximize the usefulness of SFL at pinpointing the cause of any error that may occur. The DDU diagnosability metric thereby serves as an indicator of the accuracy of SFL’s diagnostic reports for a given test suite—similarly to how adequacy measurements, such as branch coverage, act as indicators for fault detection—allowing users to measure, and also improve, the quality of their test suite. Second, we conduct a large-scale evaluation assessing how faults are actually fixed in practice. This evaluation is motivated by the fact that similarity-based SFL techniques are most effective when only one fault is responsible for all test failures. Similarity-based techniques cannot handle multiple simultaneous faults with the same degree of accuracy, unlike the more computationally expensive reasoning-based SFL techniques. Our hypothesis is that, in practice, faults are mostly detected and fixed in isolation, thereby resulting in single-fault localization problems, which can diagnosed with lightweight similarity-based SFL variants. We propose a methodology for mining software repositories and classifying fixes according to the number of faults they address. Our evaluation found that 82% of all fixes were single-faulted, yielding high diagnostic accuracy when similarity-based variants were used. Third, we propose an enhancement to SFL that leverages concepts from Qualitative Reasoning (QR). QR is an area of research within Artificial Intelligence that studies ways to abstract complexity by partitioning continuous-valued variables into discrete qualitative states, which are subsequently used to model systems in a more lightweight, tractable manner. Similarly, our approach—named Q-SFL—partitions the runtime value of spectrum components into sets of qualitative states. These states are then considered as contextual SFL components—their involvement in tests are thus recorded in the spectrum. The main advantages of such augmentation are increased fault isolation and improved fault comprehension. Our evaluation shows that augmenting SFL through qualitative partitioning can improve diagnostic accuracy, but further work is needed to develop effective automated partitioning strategies. Lastly, as a way to expand the applicability of SFL, we propose Spectrum-based Feature Comprehension (SFC). SFC provides a mapping of SFL concepts to the task of feature detection, which typically is the most time consuming step during software maintenance scenarios. SFC shares many similarities with SFL, but instead of correlating component coverage with failing tests, components are correlated with feature involvement. Our user study shows that users were able to more accurately pinpoint features with the aid of SFC, compared to using a test coverage tool. Furthermore, in cases where a mapping between tests and the features they exercise is not readily available (or if there are no tests at all), we propose Participatory Feature Detection (PFD). PFD allows users to manually label their interactions with a system as associated or dissociated with the feature they want to locate. Our evaluation of PFD shows that the technique is able to achieve considerable accuracy at detecting features, even when users misclassify their recorded interactions. 

Jury: Doctor Franz Wotawa (Institute for Software Technology da Technical University of Graz)

         Doctor Duarte Nuno Jardim Nunes (IST)

         Doctor Rui Filipe Lima Maranhão de Abreu (IST)

         Doctor João Alexandre Baptista Vieira Saraiva (University of Minho)

         Doctor João Manuel Paiva Cardoso, (University of Porto)

         Doctor João Carlos Pascoal Faria, (University of Porto)

Location: Sala de Atos da FEUP