Gonçalves RC, Pereira JO, Jimenez-Peris R.  2016.  An RDMA Middleware for Asynchronous Multi-stage Shuffling in Analytical Processing. DAIS '16: Proceedings of the 16th IFIP International Conference on Distributed Applications and Interoperable Systems. Abstract

A key component in large scale distributed analytical processing is shuffling, the distribution of data to multiple nodes such that the computation can be done in parallel. In this paper we describe the design and implementation of a communication middleware to support data shuffling for executing multi-stage analytical processing operations in parallel. The middleware relies on RDMA (Remote Direct Memory Access) to provide basic operations to asynchronously exchange data among multiple machines. Experimental results show that the RDMA-based middleware developed can provide a 75 % reduction of the costs of communication operations on parallel analytical processing tasks, when compared with a sockets middleware.

Neves DT, Gonçalves RC.  2015.  On the Synthesis and Reconfiguration of Pipelines. ARCS '15: Proceedings of the 28th International Conference on Architecture of Computing Systems. Abstractneves-2015.pdf

In recent years we have observed great advances in parallel platforms and the exponential growth of datasets in several domains. Undoubtedly, parallel programming is crucial to harness the performance potential of such platforms and to cope with very large datasets. However, quite often one has to deal with legacy software systems that may use third-party frameworks, libraries, or tools, and that may be executed in different multicore architectures. Managing different software configurations and adapt them for different needs is an arduous task, particularly when it has to be carried out by scientists or when dealing with irregular applications. In this paper, we present an approach to abstract legacy software systems using workflow modeling tools. We show how a basic pipeline is modeled and adapted—using model transformations—to different application scenarios, either to obtain better performance, or more reliable results. Moreover, we explain how the system we provide to support the approach is easily extensible to accommodate new tools and algorithms. We show how a pipeline of three irregular applications— all from phylogenetics—is mapped to parallel implementations. Our studies show that the derived programs neither downgrade performance nor sacrifice scalability, even in the presence of a set of asymmetric tasks and when using third-party tools.

Batory D, Gonçalves RC, Marker B, Siegmund J.  2013.  Dark Knowledge and Graph Grammars in Automated Software Design. SLE '13: Proceeding of the 6th International Conference on Software Language Engineering. Abstract

Mechanizing the development of hard-to-write and costly-to-maintain software is the core problem of automated software design. Encoding expert knowledge (a.k.a. dark knowledge) about a software domain is central to its solution. We assert that a solution can be cast in terms of the ideas of language design and engineering. Graph grammars can be a foundation for modern automated software development. The sentences of a grammar are designs of complex dataflow systems. We explain how graph grammars provide a framework to encode expert knowledge, produce correct-by-construction derivations of dataflow applications, enable the generation of high-performance code, and improve how software design of dataflow applications can be taught to undergraduates.

Gonçalves RC, Sobral JL.  2012.  Modular and Non-Invasive Distributed Memory Parallelization. MISS '12: Proceedings of the 2012 workshop on Modularity in Systems Software. :33–38. Abstractgoncalves-2012.pdf

This paper presents an aspect-oriented library to support parallelization of Java applications for distributed memory environments, using a message-passing approach. The library was implemented using AspectJ language, and aims to provide a set of mechanisms to make easier to parallelize applications, as well as to solve well known problems of parallelization, such as lack of modularity and reusability. We compare the advantages of this method over the traditional approach, and we discuss differences to recent approaches that address the same problem. Results show benefits over other approaches, and, in most of cases, a competitive performance.

Riché TL, Gonçalves RC, Marker B, Batory D.  2012.  Pushouts in Software Architecture Design. GPCE '12: Proceedings of the 11th International Conference on Generative Programming and Component Engineering. :84–92. Abstractriche-2012.pdf

A classical approach to program derivation is to progressively extend a simple specification and then incrementally refine it to an implementation. We claim this approach is hard or impractical when reverse engineering legacy software architectures. We present a case study that shows optimizations and pushouts—in addition to refinements and extensions—are essential for practical stepwise development of complex software architectures.

Gonçalves RC, Sobral JL.  2009.  Pluggable Parallelization. HPDC '09: Proceedings of the 18th ACM international symposium on High Performance Distributed Computing. :11–20. Abstractgoncalves-2009.pdf

This paper presents the concept of pluggable parallelisation that allows scientists to develop “sequential like” codes that can take advantage of multi-core, cluster and grid systems. In this approach parallel applications are developed by plugging parallelisation patterns/idioms into scientific codes (e.g., “sequential like” codes), softening the move from sequential to parallel programming and promoting the separation between domain specific code and parallelisation issues. Pluggable parallelisation combines three characteristics: 1) parallelisation is performed from “outside to inside”, localising parallelisation concerns into well defined modules, reducing changes required to the domain specific code and avoiding invasive parallelisation of base code; 2) control view is separated from data view promoting a stronger separation of concerns which improves reuse of parallelisation concerns across platforms and enables fine-grained refinements; and 3) abstractions can be composed, supporting the development of more complex patterns based on fine-grained features. This paper presents the concept of pluggable parallelisation and shows how some well-known parallelisation strategies can be implemented in this approach. Results show that this is a feasible approach and performance is competitive with traditional parallel programming.

Sousa E, Gonçalves RC, Neves DT, Sobral JL.  2008.  Non-Invasive Gridification through an Aspect-Oriented Approach. Ibergrid '08: Proceeding of the 2nd Iberian Grid Infrastructure Conference. :323–334.sousa-2008.pdf
Gonçalves RC, Batory D, Sobral JL, Riché TL.  2017.  From software extensions to product lines of dataflow programs. Software and Systems Modeling. 16(4):929-947. Abstract

Dataflow programs are widely used. Each program is a directed graph where nodes are computations and edges indicate the flow of data. In prior work, we reverse-engineered legacy dataflow programs by deriving their optimized implementations from a simple specification graph using graph transformations called refinements and optimizations. In MDE-speak, our derivations were PIM-toPSM mappings. In this paper, we show how extensions complement refinements, optimizations, and PIM-to-PSM derivations to make the process of reverse engineering complex legacy dataflow programs tractable. We explain how optional functionality in transformations can be encoded, thereby enabling us to encode product lines of transformations as well as product lines of dataflow programs. We describe the implementation of extensions in the ReFlO tool and present two non-trivial case studies as evidence of our work’s generality.

Gonçalves RC, Batory D, Sobral JL.  2016.  ReFlO: an interactive tool for pipe-and-filter domain specification and program generation. Software and Systems Modeling. 15(2):377-395. Abstract22.pdf

ReFlO is a framework and interactive tool to record and systematize domain knowledge used by experts to derive complex pipe-and-filter (PnF) applications. Domain knowledge is encoded as transformations that alter PnF graphs by refinement (adding more details), flattening (removing modular boundaries), and optimization (substituting inefficient PnF graphs with more efficient ones). All three kinds of transformations arise in reverse-engineering legacy PnF applications. We present the conceptual foundation and tool capabilities of ReFlO, illustrate how parallel PnF applications are designed and generated, and how domain-specific libraries of transformations are developed.

Gonçalves RC.  2015.  Parallel Programming by Transformation. Abstractgoncalves-2015.pdf

The development of efficient software requires the selection of algorithms and optimizations tailored for each target hardware platform. Alternatively, performance portability may be obtained through the use of optimized libraries. However, currently all the invaluable knowledge used to build optimized libraries is lost during the development process, limiting its reuse by other developers when implementing new operations or porting the software to a new hardware platform. To answer these challenges, we propose a model-driven approach and framework to encode and systematize the domain knowledge used by experts when building optimized libraries and program implementations. This knowledge is encoded by relating the domain operations with their implementations, capturing the fundamental equivalences of the domain, and defining how programs can be transformed by refinement (adding more implementation details), optimization (removing inefficiencies), and extension (adding features). These transformations enable the incremental derivation of efficient and correct by construction program implementations from abstract program specifications. Additionally, we designed an interpretations mechanism to associate different kinds of behavior to domain knowledge, allowing developers to animate programs and predict their properties (such as performance costs) during their derivation. We developed a tool to support the proposed framework, ReFlO, which we use to illustrate how knowledge is encoded and used to incrementally—and mechanically—derive efficient parallel program implementations in different application domains. The proposed approach is an important step to make the process of developing optimized software more systematic, and therefore more understandable and reusable. The knowledge systematization is also the first step to enable the automation of the development process.