Software Engineering & Programming Languages

For background information on the topic as a whole, scroll to the end of this page.

Available Project Proposals

If you are interested in the general topic of Software Engineering and Programming Languages or if have your own project idea related to the topic, please contact us directly. Alternatively, you can also work on one of the following concrete project proposals:

  • AI-Assisted Refactoring to Improve Software Energy Efficiency (Fernando Castor)

    This research project explores the use of artificial intelligence to enhance software energy efficiency through automated refactoring. Previous work [1] has shown that large language models, in their current state, have limited success in generating programs that are more energy-efficient than human-written solutions, even if they were designed with performance as an important requirement. At the same time, research studying the usefulness of LLMs in program refactoring tends to focus on improving maintainability and perform traditional refactorings such as renaming, moving, and extracting methods [2,3]. In this work, we aim to examine how different language models and prompting approaches can be used to make preexisting programs more energy efficient. In terms of prompting approaches, this includes analyzing what kind of information can be useful to ensure that (i) performance is improved and (ii) behavior is preserved. This is still a largely unexplored research area [4]. We plan to examine programs written in different languages, where candidates include Java, C++, CUDA, Go, Haskell, and Rust. Furthermore, we intend to give particular attention to concurrent and parallel programs. Previous work suggests that they may be interesting candidates due to a less direct relationship between their performance and energy use.

    [1] Lola Solovyeva, Sophie Weidmann, Fernando Castor: AI-Powered, But Power-Hungry? Energy Efficiency of LLM-Generated Code. CoRR abs/2502.02412 (2025)

    [2] A. Shirafuji, Y. Oda, J. Suzuki, M. Morishita and Y. Watanobe: Refactoring Programs Using Large Language Models with Few-Shot Examples. APSEC 2024: 151-160.

    [3] Dorin Pomian, Abhiram Bellur, Malinda Dilhara, Zarina Kurbatova, Egor Bogomolov, Timofey Bryksin, Danny Dig: Next-Generation Refactoring: Combining LLM Insights and IDE Capabilities for Extract Method. ICSME 2024: 275-287.

    [4] Pooja Rani and Jan-Andrea Bard and June Sallou and Alexander Boll and Timo Kehrer and Alberto Bacchelli: Can We Make Code Green? Understanding Trade-Offs in LLMs vs. Human Code Optimizations. CoRR abs/2503.20126 (2025)

  • Atoms of Confusion in Rust (Fernando Castor)

    Atoms of Confusion [1] are small, misleading code patterns that can cause developers to misinterpret program behavior, leading to potential bugs and security vulnerabilities. More specifically [2], atoms of confusion are code patterns that are (i) precisely identifiable, (ii) verifiably likely to cause confusion, (iii) replaceable by a functionally equivalent code pattern that is less likely to cause confusion, and (iv) indivisible. An example is the use of assignment expressions in loop and conditional statement conditions in languages such as C, Java, and JavaScript.

        if (V1 = m()) { ... }

    It is possible to write a functionally-equivalent alternative version that is arguably less confusing:

        V1 = m();

        if (V1) { ... }

    Multiple studies [3,4,5] have shown that atoms of confusion potentially lead to programmer mistakes and may require more time to understand than functionally-equivalent alternatives. In addition, they are widely used in real-world programs, and have been shown to be associated with post-release defects and extra code comments [6], although evidence in this regard is contradictory [7].

    Since different languages have diverse constructs, guarantees, and typical use cases, atoms that have been experimentally verified for one language do not necessarily apply to others [3,4,5]. In this work, we would like to define what are the code patterns that are candidates to be considered atoms of confusion in the Rust programming language and investigate their prevalence in real-world Rust code. Rust is an interesting language for studying this topic because like C, the language targeted by seminal work on atoms of confusion, it is a systems programming language aimed at writing high performance programs. Differently from C, its design philosophy avoids many of the caveats inherent to writing C code, including many atoms of confusion, such as the one presented above. At the same time, its sophisticated type system and ownership model, aimed at making programs less buggy and reduce security vulnerabilities, may hinder readability of programs written in it. In this project we would like to define a catalog of candidates atom candidates in the Rust language, validate these candidates by means of experiments, build a tool capable of identifying atoms from real Rust programs, and finally study their prevalence in real world Rust projects.


    References

    [1] Gopstein, D., Iannacone, J., Yan, Y., DeLong, L., Zhuang, Y., Yeh, M.K., Cappos, J., 2017. Understanding misunderstandings in source code. In: ESEC/SIGSOFT FSE. ACM, pp. 129–139.

    [2] Fernando Castor, 2018. Identifying Confusing Code in Swift Programs. In Proceedings of the VI CBSoft Workshop on Visualization, Evolution, and Maintenance. São Carlos, Brazil.

    [3] José Aldo Silva da Costa, Rohit Gheyi, Fernando Castor, Pablo Roberto Fernandes de Oliveira, Márcio Ribeiro, Baldoino Fonseca, 2023. Seeing confusion through a new lens: on the impact of atoms of confusion on novices' code comprehension. Empir. Softw. Eng. 28(4): 81.

    [4] Adriano Torres, Caio Oliveira, Márcio Vinicius Okimoto, Diego Marcilio, Pedro Queiroga, Fernando Castor, Rodrigo Bonifácio, Edna Dias Canedo, Márcio Ribeiro, Eduardo Monteiro, 2023. An Investigation of confusing code patterns in JavaScript. J. Syst. Softw. 203: 111731.

    [5] Chris Langhout, Maurício Aniche, 2021. Atoms of Confusion in Java. ICPC 2021: 25-35.

    [6] Gopstein, D., Zhou, H.H., Frankl, P.G., Cappos, J., 2018. Prevalence of confusing code in software projects: atoms of confusion in the wild. In: MSR. ACM, pp. 281–291.

    [7] Guoshuai Shi, Farshad Kazemi, Michael W. Godfrey, Shane McIntosh, 2024. Reevaluating the Defect Proneness of Atoms of Confusion in Java Systems. ESEM 2024: 154-164.

  • Design your own PL (Peter Lammich)

    Design your own programming language, trying to combine fancy features from recent modern programming languages in a reasonable way. Use LLVM as back-end, to get all standard optimizations and code-generation (almost) for free.

  • DSLs for Networking Paradigms (Georgiana Caltais)

    Are you a graduate student eager to tackle real-world problems in networking and data processing? Join us for an exciting Master's thesis project focusing on Domain-Specific Languages (DSLs) for Software Defined Networking (SDN) and big data applications!

    SDN-based technologies, embraced by industry leaders like Google and Intel IT, can play an important role in solving issues concerning data processing in cloud data centers, optimisations and data delivery. SDN is an emerging approach to network programming, in a setting where the network control is decoupled from the forwarding functions. This makes the network control directly programmable and more flexible to change.

    In this project, you'll have the chance to advance DSLs for SDN-based technologies, helping to solve critical issues concerning SDNs’ reliability. Whether you're into theory or hands-on work, there's a project suited for you:

    • If you prefer hands-on experimentation, roll up your sleeves and delve into devising algorithms for extracting and analysing SDNs based on real datasets, while gaining valuable real-world experience along the way.
    • For the theory enthusiasts, explore how mathematical frameworks can drive safety and robustness in SDNs. Furthermore, work on liability frameworks for SDNs, to answer questions such as: “What caused my network to fail?” or “Who is responsible for that packet loss?”.
    • Bring your ideas and expertise to the table as we tackle some of the most pressing challenges in networking and data processing.

    Your Master's thesis could be the key to unlocking groundbreaking advancements in SDN and big data technologies!

  • Hyperparameter's Impact on the Energy Consumption of LLM's supporting Software Development (Fernando Castor)

    Generative AI and coding assistants are revolutionizing software development, but their energy demands are rapidly escalating. At the same time, concerns about data privacy have driven many developers and organizations to consider deploying their own local AI assistants. The widespread adoption of large language models (LLMs) presents a critical trade-off: balancing energy consumption with task accuracy.

    This study aims to investigate the impact of hyperparameter adjustments such as temperature, top-p and max_output_tokens on energy consumption and accuracy in two common software development tasks: code generation and bug fixing. By evaluating a diverse set of language models on an AI-specific GPU to replicate real-world scenarios. We aim to provide actionable insights that guide developers in optimizing the deployment of LLMs for efficiency and performance.

  • Model Transformation for Dutch sewers (Arend Rensink, Hajo Molegraaf)

    Supervisors: Arend Rensink, Hajo Molegraaf, Luís Ferreira Pires

    The the company Rolsch Assetmanagement provides software support for managing networks for drinking water and sewage. This involves, among other things, communicating about such networks with many different clients, most of whom have their own data bases with specific schemas and technologies. These have to be transformed to Rolsch' own representation, on which computations take place. These transformation are bidirectional. Currently, they are hand-coded in Python, on a case-by-case (i.e., customer-by-customer) basis. Though performance is not a prime concern, the data bases themselves are large, involving extensive pipe networks with millions of segments. Due to changes in the customer base, evolution of their schemas, but also evolution of the Rolsch-internal representation, their transformations have to be continually maintained.

    In collaboration with RioNed, the Dutch overarching association for urban water control, Rolsch is initiating a project to collect the sewer inspection data from all Dutch municipalities, with the purpose of providing a transparant and unifying view that can be used by third parties, for instance to provide training data for automatic defect detection. This will create a specific case of the kind of transformation described above.

    The aim of this Master project is to apply the principles of Model Transformation (MT) to this setting. The idea is to specify the required transformations declaratively, in a rule-based paradigm, after which the code can be generated. If successful, this approach could eventually replace the hand-coded transformations with declaratively specified ones, relieving much of the maintenance burden because this can be lifted to the level of the specifications, and opening up the possibility to reason about correctness and completeness of the transformations.

  • The Real Meaning of State Machine Diagrams (Arend Rensink, Marcus Gerhold)

    Supervisors: Arend Rensink, Marcus Gerhold

    State Machine Diagrams (SMDs) are a well-known, powerful graphical modelling language for describing the expected behaviour of large, complex systems. Its power lies in the availability of multiple features for the compact representation of frequently occurring behavioural patterns, such as concurrency, non-determinism, multiple exit points, triggers and synchronisation. However, at the same time, this wealth of features makes it hard to analyze all their possible combinations and to be sure that the expected behaviour is under all circumstances well-understood.

    This project aims to provide a semantics for SMDs by fully formalising an existing transformation of SMDs to Finite State Machines and implementing it declaratively (using Graph Transformation), compemented by an operational semantics that provides a criterion for the correctness of the transformation (also using Graph Transformation).

    For further reference see

Contact

Background

Software engineering is the most technical part of technical computer science, focused on developing and applying systematic principles common to other kinds of engineering (like mechanical or electrical engineering) to development of software systems. In particular, it covers:

Software evolution, in particular, is a branch of software engineering focused on studying existing software and not necessarily creating new one. It covers, among other topics:

A typical research project in software engineering involves an implementation of a software system at least up to a fully functioning prototype, performing a feasibility study and/or a user study. A typical software evolution project covers development of a software system that analyses or transforms another software system. Both often use methodologies from empirical software engineering.

There are many programming languages out there, with different features for solving different problems. Most of them can be roughly categorized in what paradigms, i.e., approaches to programming, they support. Three well-known paradigms are:

Note, however, that many programming languages support more than one paradigm, or are somewhere in between. For example, Python also supports functional programming, and C++ and Java have evolved from purely imperative languages to also support functional features, like lambda abstractions. An overview of many paradigms and languages can be found in this video, which is based on Peter van Roy's overview.

In this topic, we cover questions about programming languages, including their design, features, and implementations. Typical projects in this topic

Prerequisites

Related modules and courses