Software Engineering & Evolution

Software engineering is the most technical part of technical computer science, focused on developing and applying systematic principles common to other kinds of engineering (like mechanical or electrical engineering) to development of software systems. In particular, it covers:

Software evolution, in particular, is a branch of software engineering focused on studying existing software and not necessarily creating new one. It covers, among other topics:

A typical research project in software engineering involves an implementation of a software system at least up to a fully functioning prototype, performing a feasibility study and/or a user study. A typical software evolution project covers development of a software system that analyses or transforms another software system. Both often use methodologies from empirical software engineering.

Prerequisites

Related courses

Available Project Proposals

If you are interested in the general topic of Software Engineering and Evolution, or if have your own project idea related to the topic, please contact us directly. Alternatively, you can also work on one of the following concrete project proposals:

  • Advancements in Refactoring Mining with Deep Learning and SB Algorithms

    Supervisors: Iman Hemati MoghadamVadim Zaytsev.

    Gaining insights into the patterns of previously applied refactorings can greatly enhance the accuracy of detecting applied refactorings. By utilizing machine learning algorithms, we can extract change patterns through model training on pre-existing refactorings. However, in order to achieve a highly accurate model, it is crucial to train it on a comprehensive and extensive dataset.

    The first objective of this project is to create an automated mechanism that generates a dataset of refactorings applied in Java applications. Mauricio et al. [1] offer a dataset with millions of refactorings and an ML model trained on this dataset to recognize refactoring opportunities. The provided dataset includes class, method, and variable-level refactorings but lacks field-level refactorings. Furthermore, the dataset's refactorings are identified using an outdated version of RefactoringMiner [2], which may have missed some applied refactorings. Although RefactoringMiner has improved recall in its latest version, our new experiments [3] revealed still some applied refactorings not detected by its latest version. Consequently, the dataset by Mauricio et al. does not encompass all applied refactorings. Furthermore, in order to develop a model that can effectively identify applied refactoring, we need a more informative dataset that incorporates information from both the original and refactored versions of the program. Currently, the dataset provided by Mauricio et al. only includes information about the original version of the program. Therefore, we need to create the dataset from scratch. The most challenging aspect of this process lies in accurately assessing the validity of each refactoring included in the dataset.

    Our approach involves employing RefactoringMiner [2] and RefDetect [3] to extract the refactorings that have been applied in Java applications within the dataset compiled by Mauricio et al. [1]. RefactoringMiner demonstrates high precision, ensuring that the refactorings identified by it can be considered valid. However, its recall rate is comparatively lower than that of RefDetect, which we will employ to identify refactoring instances missed by RefactoringMiner. Given RefDetect's strong recall performance, we are confident that any changes not recognized as refactorings by RefDetect can be classified as non-refactoring modifications. Nevertheless, it is imperative to verify changes identified by RefDetect as refactoring and detected as non-refactoring by RefactoringMiner. We call them ambiguous changes. Although it is not impossible to conduct a manual evaluation, it can be a time-consuming and error-prone task. Therefore, it is imperative to discover an automated approach for validating ambiguous changes. To address this, we propose exploring the possibility of automatically labelling the refactorings using a machine-learning algorithm or a combination of machine-learning and search-based techniques. Further information regarding the challenges encountered and the corresponding approaches proposed to address them is provided in a detailed description of the project.

    After establishing the aforementioned dataset, the next objective of this project is to leverage the dataset to train a model that can proficiently recognize applied refactorings in Java applications or augment RefDetect [3] in making more informed determinations when identifying refactorings.

    [1] Aniche, M., Maziero, E., Durelli, R., & Durelli, V. H. (2020). The effectiveness of supervised machine learning algorithms in predicting software refactoring. IEEE Transactions on Software Engineering, 48(4), 1432-1450.

    [2] Tsantalis, N., Ketkar, A., & Dig, D. (2020). RefactoringMiner 2.0. IEEE Transactions on Software Engineering, 48(3), 930-950.

    [3] Moghadam, I. H., Cinnéide, M. Ó., Zarepour, F., & Jahanmir, M. A. (2021). RefDetect: A multi-language refactoring detection tool based on string alignment. IEEE Access, 9, 86698-86727.

  • Evaluating the Robustness of Pre-Trained Code Embedding Methods for Complex Code Modifications

    Supervisors: Iman Hemati MoghadamVadim Zaytsev.

    There exist many pre-trained code embedding methods, and determining the most resilient approach in the presence of modifications has consistently intrigued researchers. Although certain investigations have examined and compared a selection of prevalent embedding models such as CodeBERT, Codex, Code2vec, and Code2src, their emphasis has predominantly centred around the generation of adversarial samples via the renaming of entities using Rename refactoring. In a few research works, however, code modifications were made using transformations like loop exchange, try-catch insertion, replacing switch statements with if, swapping unrelated statements, etc.

    However, in the majority of software development activities where the pre-trained models are used (such as code completion, code understanding, bug detection and fixing and refactoring assistance, etc.), the modifications extend beyond simple changes. Therefore, it is necessary to evaluate the robustness of the existing pre-trained models on complex code modification. To accomplish this goal, as an initial step, we generate semantically equivalent code samples by applying refactorings such as Extract and Inline Method refactorings and subsequently evaluating the performance of different models. The findings from this research can offer invaluable insights for selecting a robust model for future projects, thereby enhancing decision-making processes.

  • Exploring Error Handling in Rust Programs

    Supervisor: Fernando Castor

    The Rust programming language aims to make systems programming efficient and safe at the same time by helping developers build programs that are safe by construction. The language is statically typed and supports safe access to memory, without the need for a garbage collector or runtime system, with the help of its compiler. It also provides scoped concurrency while avoiding state sharing, with exit synchronization for groups of threads. According to the 2023 StackOverflow developers survey (https://survey.stackoverflow.co/2023/), it is the most admired technology for survey respondents and has been so for many years.

    One thing that Rust does not have, though, is a specific mechanism for signaling and handling errors, differently from a number of popular programming languages, such as Java, C++, Swift, and Python. In Rust, unrecoverable errors are signaled by the panic() function. Computations that may produce errors are represented by values of Result, an enumerated type that encapsulates both correct and erroneous results. These values are just regular Rust values and are not propagated automatically, differently from exceptions in other languages. On the one hand, this means that Rust avoids additional runtime infrastructure to perform stack unwinding during exception propagation. On the other hand, developers must explicitly worry about whether the output of a function is an error or not.

    Previous work has shown that, in a number of languages, developers give less attention to code that handles errors than to other parts of the code. They test error handling code less [1], capture errors without doing anything with them [2,3], capture the incorrect errors [4], fail to account for potential errors [5], and sometimes simply do not use the language's error handling mechanism [6]. Problemas with error handling are commonplace even in languages that do not include specific mechanisms for handling errors [7].

    In this project we would like to address a high-level research question:

    RQ How do Rust programmers handle errors? How much code is dedicated to that?

    This question can be decomposed in a number of more specific research questions:

    RQ1 How are errors typically handled in Rust programs? Are they often ignored, as in other languages?

    RQ1.1 Is it common to have long sequences (in terms of method calls) where we have chained error handling, the kind of thing that would not be there with exception propagation?

    RQ2 What do developers think of Rust error handling? Is it better than C? Better than exceptions?

    RQ3 Do automated tests for Rust programs test exceptional paths?

    RQ4 What are error handling bugs in Rust like?

    RQ5 How are errors handled in the presence of scoped concurrency?


    References

    [1] Felipe Ebert, Fernando Castor, Alexander Serebrenik. An exploratory study on exception handling bugs in Java programs. J. Syst. Softw. 106: 82-101 (2015)

    [2] Nathan Cassee, Gustavo Pinto, Fernando Castor, Alexander Serebrenik. How swift developers handle errors. MSR 2018: 292-302

    [3] Bruno Cabral, Paulo Marques. Exception Handling: A Field Study in Java and .NET. ECOOP 2007: 151-175

    [4] Nélio Cacho, Thiago César, Thomas Filipe, Eliezio Soares, Arthur Cassio, Rafael Souza, Israel García, Eiji Adachi Barbosa, Alessandro Garcia. Trading robustness for maintainability: an empirical study of evolving c# programs. ICSE 2014: 584-595

    [5] Juliana Oliveira, Deise Borges, Thaisa Silva, Nélio Cacho, Fernando Castor. Do android developers neglect error handling? a maintenance-Centric study on the relationship between android abstractions and uncaught exceptions. J. Syst. Softw. 136: 1-18 (2018)

    [6] Rodrigo Bonifácio, Fausto Carvalho, Guilherme Novaes Ramos, Uirá Kulesza, Roberta Coelho. The use of C++ exception handling constructs: A comprehensive study. SCAM 2015: 21-30

    [7] Magiel Bruntink, Arie van Deursen, Tom Tourwé. Discovering faults in idiom-based exception handling. ICSE 2006: 242-251

  • High-Performance Simulations for High-Energy Physics Experiments

    Supervisors: Uraz OdyurtVadim Zaytsev.

    The role of simulation and synthetic data generation for High-Energy Physics (HEP) research is profound. While there are physics-accurate simulation frameworks available to provide the most realistic data syntheses, these tools are computationally demanding. Additionally, the output from physics-accurate simulations is hard to comprehend, hard to manipulate and difficult to work with, as the data is rather close to the real case. These simulations consider accurate models of real-world detectors, which is another limitation.

    Parametric and complexity-aware simulation frameworks on the other hand, can redefine the complexity space in drastically simplified terms and generate complexity-reduced data sets. It is also possible to consider a variety of detector models for these simulations. The applications of complexity-reduced simulations and data are numerous. We will be focusing on the role of such data as an enabler for Machine Learning (ML) model design research.

    This project aims to extend an existing REDVID simulation framework through addition of new features.

    (Read the full project description)

  • Mining Questions about Software Energy Consumption

    Supervisor: Fernando Castor

    Nowadays, thanks to the rapid proliferation of mobile phones, tablets, and unwired devices in general, energy efficiency is becoming a key software design consideration where the energy consumption is closely related to battery lifetime. It is also of increasing interest in the non-mobile arena, such as data centers and desktop environments. Energy-efficient solutions are highly sought after across the compute stack, with more established results through innovations in hardware/architecture [1,2], operating systems [3], and runtime systems [4]. In recent years, there is a growing interest in studying energy consumption from higher layers of the compute stack and most of these studies focus on application software [5,6,7,8]. These approaches complement prior hardware/OS-centric solutions, so that improvements at the hardware/OS level are not cancelled out at the application level, e.g., due to misuses of language/library/application features.

    We believe a critical dimension to further improve energy efficiency of software systems is to understand how software developers think. The needs of developers and the challenges they face may help energy-efficiency researchers stay focused on the real-world problems. The collective wisdom shared by developers may serve as a practical guide for future energy- aware and energy-efficient software development. The conceptually incorrect views they hold may inspire educators to develop more state-of-the-art curricula.

    The goal of this work is to obtain a deeper understanding of (i) whether application programmers are interested in software energy consumption, and, if so, (ii) how they are dealing with energy consumption issues. Specifically, the questions we are trying to answer are:

    RQ1 What are the distinctive characteristics of energy-related questions?

    RQ2 What are the most common energy-related problems faced by software developers?

    RQ3 According to developers, what are the main causes for software energy consumption?

    RQ4 What solutions do developers employ or recommend to save energy?

    We leverage data from StackOverflow, the most popular software development Q&A website, and on issues reported in issue trackers of real open source software projects to answer these questions.


    References

    [1] L. Bircher and L. John. Analysis of dynamic power management on multi-core processors. In ICS, 2008.

    [2] A. Iyer and D. Marculescu. Power efficiency of voltage scaling in multiple clock, multiple voltage cores. In ICCAD, 2002.

    [3] R. Ge, X. Feng, W. chun Feng, and K. Cameron. Cpu miser: A performance-directed, run-time system for power-aware clusters. In ICPP, 2007.

    [4] H. Ribic and Y. D. Liu. Energy-efficient work-stealing language runtimes. In ASPLOS, 2014.

    [5] Wellington Oliveira, Bernardo Moraes, Fernando Castor, João Paulo Fernandes. Analyzing the Resource Usage Overhead of Mobile App Development Frameworks. EASE 2023: 152-161

    [6] Wellington Oliveira, Renato Oliveira, Fernando Castor, Gustavo Pinto, João Paulo Fernandes. Improving energy-efficiency by recommending Java collections. Empir. Softw. Eng. 26(3): 55 (2021)

    [7] Ding Li, Shuai Hao, William G. J. Halfond, Ramesh Govindan. Calculating source line level energy information for Android applications. ISSTA 2013: 78-89

    [8] Stefanos Georgiou, Maria Kechagia, Tushar Sharma, Federica Sarro, Ying Zou. Green AI: Do Deep Learning Frameworks Have Different Costs? ICSE 2022: 1082-1094

  • Recovering Refactoring Opportunities Using Pre-Trained Language Models

    Supervisors: Iman Hemati Moghadam, Vadim Zaytsev.

    Metric-based approaches are the most used technique in identifying refactoring opportunities by calculating a particular set of code metrics and applying predefined thresholds for each code metric. However, this approach faces three key challenges. Firstly, the lack of a universally accepted methodology for selecting metrics. Secondly, the absence of standardized definitions for code metrics. Furthermore, the accuracy of the metric-based approaches is heavily dependent on choosing appropriate thresholds.

    Machine learning techniques provide effective solutions to overcome the aforementioned limitations. In certain approaches, the code snippet is transformed into a vector of source code metrics, which is then used to train ML classifiers. However, this approach fails to preserve the semantics and structure of the code. Conversely, pre-trained language models like CodeBERT and Codex have been trained in vast amounts of code and have learned to understand the syntax, semantics, and context of programming languages. This valuable knowledge can be transferred in recovering refactoring opportunities, and enable us to perform well even with limited task-specific training data.

    The primary objective of this project is to improve the performance of a pre-trained language model (e.g., Codex, CodeBERT, etc.) in identifying opportunities for refactoring within Java applications. To accomplish this, we will start by fine-tuning the chosen language model using Mauricio et al.'s existing dataset [1]. Then, we will utilize the enhanced model to identify potential refactoring opportunities in Java applications. To assess the effectiveness of our approach, we will compare its accuracy with the model proposed by Mauricio et al. [1].

    [1] Aniche, M., Maziero, E., Durelli, R., & Durelli, V. H. (2020). The effectiveness of supervised machine learning algorithms in predicting software refactoring. IEEE Transactions on Software Engineering, 48(4), 1432-1450.

  • Refactoring Recommendations Using Pre-Trained Language Models

    Supervisors: Iman Hemati MoghadamVadim Zaytsev.

    While current machine learning models, such as one developed by Mauricio et al. [1], can identify opportunities for refactorings, they do not provide specific guidance on how to apply the refactorings. For instance, while the existing models can recognize the necessity of applying an Extract Method refactoring to a given method, they do not specify which portion of the method should be extracted (e.g., the initial token and end token).

    The goal of this project is to develop algorithms that not only identify parts of code that require refactorings but also recommend appropriate refactoring techniques. The focus of this project will be on two commonly used refactoring types: Extract Method and Inline Method. These two refactoring types are frequently employed by developers, and manually applying them is challenging.

    Our proposed approach involves leveraging state-of-the-art deep learning models specifically designed for code analysis, including GraphCodeBERT, CuBERT, CodeGPT, Codex, etc. These models have undergone pre-training on extensive datasets, enabling them to comprehend the syntax, semantics, and context of programming languages effectively. Moreover, they offer the potential for fine-tuning based on our approach. However, before fine-tuning the chosen model, we need to construct a dataset that comprises both pre-refactoring and post-refactoring code snippets. To accomplish this, we will extend Mauricio et al.'s dataset [1], which encompasses thousands of Java applications. The resulting dataset comprises code snippets both before and after the application of refactorings, (i.e., Extract Method and Inline Method refactorings), and will be employed in the fine-tuning process of the selected pre-trained model.

    [1] Aniche, M., Maziero, E., Durelli, R., & Durelli, V. H. (2020). The effectiveness of supervised machine learning algorithms in predicting software refactoring. IEEE Transactions on Software Engineering, 48(4), 1432-1450.

Contact