Software Testing | Topics and Proposals

For background information on the topic as a whole, scroll to the end of this page.

Available Project Proposals

If you are interested in the general topic of Testing, or if have your own project idea related to the topic, please contact us directly. Alternatively, you can also work on one of the following concrete project proposals:

Traditionally, requirements for software systems are translated by hand into a set of test cases. This is a lot of work, and prone to errors in manual translation. With model-based testing, test cases can be derived automatically from a model that captures all requirements. However, obtaining a model suited for model-based testing from the set of natural language requirements is still a challenging task, because it requires interpretation of natural language and modelling skills to construct such a testing model from requirements. In this assignment you will investigate how well Large Language Models can close the gap, i.e. translate between requirements and models for testing. You will work on a case study with available requirements provided by Sioux (Eindhoven).
Conversational AI systems, such as chatbots and virtual assistants, are becoming increasingly common across various applications, from customer support over education to healthcare. Ensuring their reliability is essential, as failures can lead to user frustration or misinformation.
Model-Based Testing (MBT) is a well-estbalished technique that offers a structured approach to testing of (black-box) software systems by using formal models to define expected behaviors and generate test cases.
The aim of this project is to explore how MBT can be applied to conversational AI to ensure consistency and accuracy in interactions, focusing on the unique challenges posed by natural language processing.
Some example research questions are the following:
Dialogue Modeling: How can we create formal models that capture the flow of conversation, handling context changes, user intent, and response generation? In particular, how can the model represent complex dialogue structures found in real-world conversational agents?
Test Case Generation: What strategies can be used to generate meaningful test cases that simulate realistic conversations? How can we ensure that generated test cases cover key dialogue scenarios, including edge cases where the agent may fail to respond appropriately? (Natural language generation)
Response Evaluation: How can the behavior of a conversational agent be validated against the formal model? In particular, what metrics or techniques can be used to assess if the agent’s responses align with the expected outcomes defined in the model? (Natural language processing)
The research will consist of the following steps:
1. Conducting a literature review to explore existing model-based (testing) techniques that can be applied to conversational AI.
2. Developing your own approach or methods based on the chosen research question.
3. Formulating an answer to the research question by applying the developed methods to a conversational AI system.
Supervisors: Arend Rensink and Marcus Gerhold
Have you ever wondered why trains (almost) never collide? Would you believe it when you were told that more than 70% of the Dutch railway tracks are kept safe by relays only? And that this is now slowly being transformed into modern CPU-based controllers? Do you know how such a complex system is tested? If this has piqued your interest, please read on and start your (train) journey into the world of model-based testing!
The aim of this project is to apply and extend a method for Model-Based Testing of railway components. It is carried out in collaboration with Pilz, an international company in industrial automation, who are one of the suppliers of such components.
Railway infrastructure, consisting of tracks, signals, points, stations, tunnels and much more, is a prime example of a complex, safety-critical system. Its correct functioning is essential for the safe and speedy journey of trains. Moreover, the functionality of the system is evolving, for instance in order to adopt the more modern European Rail Traffic Management System (ERTMS).
"Correctness" in the sense above means "compliance to a standard". For railway infrastructure, that standard is specified in EULYNX, a flavour of SysML (which itself can be regarded as a sibling of UML). The standard is an important interface between the national railway infrastructure managers (ProRail in the case of the Netherlands) and suppliers (such as Pilz): the latter are required to show compliance as a precondition for their products to be acceptable to the former. How to show compliance in practice is, however, not prescribed.
In a recent research project, the Formal Methods and Tools group at the UT have developed a model-based testing approach for this purpose. Globally, this consists of deriving a test suite from a given EULYNX specification, and applying the tests to railway components of which compliance needs to be shown. Since testing is a bug-hunting technique, the coverage of the test suite is all-important for the trust that can be put into the results of this approach. Initial findings are quite promising, but need to be extended.
This MSc project is about extending those findings. Concretely, the following steps are or could be involved, depending on how the project develops:
1. Creating an adapter for automatically applying an existing test suite (for a point controller) to a concrete implementation developed by Pilz;
2. Generating and applying test suites for other subsystem from their respective (EULYNX) specifications, using the available tooling;
3. Studying (on a theoretical and practical level) the coverage obtained by the generated test suites;
4. Modifying and extending the current test generation tooling based on the outcome of the steps above;
5. As part of the previous step, comparing the results with other tools and approaches.
These steps should be carried out in an agile fashion and in consultation with Pilz: to what degree they can all be addressed, or will be superseded by other questions that may emerge in the course of the work, will depend on circumstances.
Background material:
- - A case in point: verification and testing of a EULYNX interface (see https://doi.org/10.1145/3528207)
- - Conformance in the Railway Industry: Single-Input-Change Testing a EULYNX Controller (see https://doi.org/10.1007/978-3-031-43681-9_15)
Supervisors: Marcus Gerhold, Mariëlle Stoelinga
Quiescence, or the absence of outputs, is an important notion in model-based testing. A system-under-test (SUT) must fail a certain test case if an output is required, but the SUT does not provide one. Therefore, absence of outputs if often labelled via a special quiescence label, delta.
In practice, it is difficult (and even impossible) to determine if a required output will never be given. Therefore, it is implemented by a time out: if no output occurs after M seconds, then it is assumed that no output will occur anymore.
Therefore, it makes sense to investigate the notion of quiescence by means of timed automata: suppose that we model everything (i.e. the SUT and the requirements) by means of timed automata, do we then get the same results as before, with the delta-labelling approach?
Tasks:
- study the concept of quiescence
- study timed automata, and the tool Uppaal (see www.uppaal.org)
- transform delta-labelled automata into timed automata
- compare the results

Contact

Background

Testing is a vital part of the software development lifecycle, ranging from small scale unit tests for individual methods, to system-wide end-to-end tests, and up to bigger-picture acceptance tests of requirements. Unlike most verification techniques, testing is often directly applied on-site on the actual software system. It is the most frequently applied verification technique in industry. At the same time, testing is costly and often consumes large chunks of a project's budget. To counteract this discrepancy, testing can be automated to varying degrees.

In FMT, we are interested in this automation. This starts at easing the creation and maintenance of test cases that can be automatically executed on the system. Model-based testing is a more advanced technique where test cases are generated by an algorithm from a model, e.g. a finite state machine or labelled transition system.

Possible research directions are as follows. You could directly improve test generation algorithms for one of the different modelling formalisms. You could also compare existing algorithms. Additionally, you could investigate how to improve testing as part of the software development process, e.g. by linking requirements with (failed) test cases, via Behaviour Driven-Development. Furthermore, you could develop or improve methods that measure and express how much has been tested for some set of executed tests.

In research on testing, theory and practice are relatively close, such that developing new theory and doing case studies are both possible.

Related Courses and Modules

202001472 Software Testing and Risk Assessment (STAR)