Machine Learning

Machine learning helps to learn from enormous amounts of data gathered as a direct or indirect product of a process, whether it be an industrial product line, efficient farming for food production, prediction of bacterial growth, or any other industrial, healthcare-related, or technological endeavor. Similar to humans, machine learning extracts patterns from data if enough examples are provided. Reinforcement learning is an action-oriented type of learning in which an agent tries to maximise a given reward signal, without being told explicitly which actions to take. This type of learning is often combined with other types of machine learning. For instance, deep learning is widely applied in scenarios with large state spaces.

The topic of machine learning leads to projects of varying nature such as theoretical (algorithm development, computational efficiency, and convergence of machine learning algorithm, formal representation), algorithm training for classification or regression (see some example projects below), predictive maintenance, and reinforcement learning.

Related Modules

Available Project Proposals

If you are interested in the general topic of Machine Learning, or if have your own project idea related to the topic, please contact us directly. Alternatively, you can also work on one of the following concrete project proposals:

  • Point cloud classification, segmentation, and visulization

    Supervisors: Faizan Ahmed

    Point clouds are sets of spatial data points captured by 3D scanning techniques such as lidar. These point clouds contain many million points of data, resulting in 3D representations of the railway environment. Point cloud data can be used to create machine learning models that can classify the object in rail infrastructure automatically. 

    The main research question is: How to reliably segment, classify, and visualize point clouds for railway catenary systems with scant computational power, memory, and ground truth labels?

    However, the student can focus on a (combination of) topic(s) given below:

    • create an end-to-end pipeline for point cloud visualization using existing libraries
    • visualizing model output(or intermediate processing) to explore interpretability/explainability
    • application of XAI techniques to point clouds
    • leverage existing trained models for point cloud labeling
    • pre-process (down-sampling. Augmentation, filtering, voxelization etc.) point clouds for accurate model training
    • train deep learning models for point cloud segmentation and object detection

    Some previous assignments related to this topic:

  • Quantitative analysis of the diversity index in student teams

    Supervisor: Yeray Barrios Fleitas

    How can we quantitatively compare two different teams? We first need to know what we are comparing it to. Suppose we want to know how diverse a specific team is in terms of the gender of its members and that it can only be male or female. A highly diverse (heterogeneous) team has the same number of male and female members. A poorly diverse (homogeneous) team may consist of all male or all female students. In this case, we have two types of homogeneity, the one that has the masculine gender and the one that tends to the feminine gender. What statistic best suits this case to analyse a team's level of diversity? What happens if, instead of considering gender as a dichotomous variable (two values), we consider it polytomous (multiple values)? Answering these questions will be your primary goal in this project. At the beginning of the project, I will provide you with a list of variables on which you should analyse the level of diversity and research material based on actual data obtained during module 4: Data & Information. Your first mission will be to review the existing literature to catalogue the strategies that have been followed so far. Then, you must apply a subset of these statistics on a real sample and analyse the effectiveness of the results. As a supervisor, we will provide you with the sources of knowledge and material necessary for you to successfully write a research paper.

    This paper should contain:

    • A review of the most used statistics to measure diversity in student teams in the shape of a taxonomy
    • The performance analysis you did with the dataset
    • A list of recommendations for comparing teams based on their different attributes.

    If the student is interested and the scope of the project allows it, a web application can be developed that reads a spreadsheet with information about the students and the teams they belong to as input for generating a diverse-level report as output.

  • Automatic detection and classification of scientific papers through the use of NLP

    Supervisor: Yeray Barrios Fleitas, Faizan Ahmed

    In research review papers,one of the most exciting contributions is usually the tables where the reviewed papers are classified according to a list of criteria related to the specific topic. This is an arduous reading jobfor the researcher, but in practice the type of information needed to classifythe paper is often in the methodology section. Attributes such as the type of paper (conceptual, experimental, exploratory, etc.), the size of the sample, or the analysedvariablesare the type of information used for classifyingthese papers.Could a softwarebe trained to read and classify papers according to their methodological nature?

    In this project, your goal will be:

    1. Identify the catalysts that indicate necessary attributes for a generic classification of papers.
    2. Make use of natural language processing (NLP)techniqueson a single documentto correctly identify the type of study and the sample size)
    3. Apply it to the complete list of articles reviewed in a systematic review of the literature and export a table with the classified papers. Analysethe degree of precision achieved.As supervisors, we will provide you with the sources of knowledge and material necessary to write a research paper successfully.

    If the student obtains significant results and is interested, s/he may consider publishing their results in aconference or journal.

  • Condition assessment of sewer and water pipe networks using data-driven models

    SupervisorsLisandro Jimenez RoaMariëlle Stoelinga

    Assessing the condition of sewer and water pipe networks is vital for maintaining the reliability and safety of these essential infrastructure systems. Ageing and deteriorating pipes can cause various issues, such as leaks, breaks, and failures, leading to significant environmental and economic impacts. This project aims to investigate machine learning models for condition assessment in sewer and water pipe networks. These models will analyze historical inspection records to provide predictions of pipe conditions. Students will select and concentrate on one of the following research areas:

    1. Data Quality and Availability: This research area addresses the challenges arising from incomplete, inaccurate, and inconsistent data on sewer and water pipe conditions, which hinders the effectiveness of condition assessment and prediction models. The student will explore strategies for enhancing data quality and availability, such as data fusion techniques or crowdsourcing data. A tangible outcome will be a data quality improvement framework customized for the specific case study.
    2. Predicting Water Pipe Failure: This research area aims to develop more accurate models for predicting water main breaks to reduce service disruptions and enhance overall infrastructure management. Students will investigate machine learning or stochastic models for water pipe failure prediction, including random forests, decision trees, support vector machines, or Markov chain models. A tangible outcome will be a predictive model that can accurately forecast water pipe failures.
    3. Robust Sewer Pipe Deterioration Models: This research area focuses on constructing (or improving existing) models for predicting sewer pipe conditions by developing data-driven models to predict the type and severity of damage in sewer pipe networks. Emphasis will be placed on model validation and prediction uncertainty, contributing to more robust condition assessment and maintenance decision-making. Students will examine techniques such as Bayesian networks, Gaussian processes, ensemble learning, neural networks, decision trees, Markov chains, or regression models. Tangible outcomes will include an enhanced prediction model with quantified uncertainty estimates and a validated sewer pipe deterioration model that can predict damage types and severities with a measurable level of confidence.

    Case Study: The project will centre on a real-world case study focusing on either sewer or water networks. The case study will involve historical inspection data, including pipe covariates (material, geometry, location, etc.), damages, and severities. This practical approach will help validate the effectiveness of the developed models and ensure their applicability to real-world situations.

    Some literature of interest:

    • Hawari, Alaa, Firas Alkadour, Mohamed Elmasry, and Tarek Zayed. 2020. 'A state of the art review on condition assessment models developed for sewer pipelines', Engineering Applications of Artificial Intelligence, 93: 103721.
    • Laakso, Tuija, Teemu Kokkonen, Ilkka Mellin, and Riku Vahala. 2018. 'Sewer condition prediction and analysis of explanatory factors', Water, 10: 1239.
    • Nguyen, Lam Van, and Razak Seidu. 2022. 'Application of Regression-Based Machine Learning Algorithms in Sewer Condition Assessment for Ålesund City, Norway', Water, 14: 3993.
    • Sousa, Vitor, José P Matos, and Natércia Matias. 2014. 'Evaluation of artificial intelligence tool performance and uncertainty for predicting sewer structural condition', Automation in Construction, 44: 84-91.
    • Weeraddana, Dilusha, Bin Liang, Zhidong Li, Yang Wang, Fang Chen, Livia Bonazzi, Dean Phillips, and Nitin Saxena. 2020. 'Utilizing machine learning to prevent water main breaks by understanding pipeline failure drivers', arXiv preprint arXiv:2006.03385.
  • Extracting Modelling Information using Natural Language Processing

    Supervisors: Tannaz Zameni, Petra van den Bos

    Behavior-Driven Development is an approach to agile software development that focuses on the collaboration of different stakeholders to specify system behavior through scenarios. BDD scenarios provide a structured, textual representation of system behavior, making them valuable resources for software development and testing. In recent work [4][5], we show how to use the information from BDD scenarios for models that are suitable for automatic test case generation. However, manually identifying and organizing information for constructing models from these scenarios can be labor-intensive and prone to errors. By automating the extraction of necessary data through NLP, this project aims to streamline the preliminary phase of model generation, enhancing software development and testing efficiency.

    The project involves exploring existing NLP tools for parsing BDD scenarios and extracting relevant details for BDD Transition Systems [5]. If off-the-shelf solutions fall short, custom implementations will be considered. The project aims to demonstrate the effectiveness of NLP in extracting modelling data from BDD scenarios, potentially improving the integration of model-based testing with behavior-driven development by simplifying the initial modelling stages.

    To start you can follow the below steps:

    1. Perform a literature search on NLP techniques and study the NLP techniques used to integrate BDD and MBT [1][2][3]
    2. Study the papers that integrate BDD and MBT with formal BDD Transition Systems [4][5]
    3. Select techniques and tools that can be used to extract modeling data
    4. Apply the NLP techniques on some BDD scenarios and evaluate the results w.r.t. appropriateness for use in testing models.
    5. If the results are not satisfying, consider developing a tool that meets the expectations


    [1] A. Gupta, G. Poels, and P. Bera, “Generating multiple conceptual models from behavior-driven development scenarios,” Data & Knowledge Engineering, vol. 145, p. 102141, 2023.

    [2] M. Soeken, R. Wille, and R. Drechsler, “Assisted behavior driven development using natural language processing,” in Objects, Models, Components, Patterns, C. A. Furia and S. Nanz, Eds. Berlin, Heidelberg: Springer, 2012, pp. 269–287.

    [3] J. Fischbach, A. Vogelsang, D. Spies, A. Wehrle, M. Junker, and D. Freudenstein, “Specmate: Automated creation of Test Cases from Acceptance Criteria,” in ICST. IEEE, 2020, pp. 321–331.

    [4] T. Zameni, P. van Den Bos, J. Tretmans, J. Foederer, and A. Rensink, "From BDD Scenarios to Test Case Generation," ICSTW, Dublin, Ireland, 2023, pp. 36-44

    [5] T. Zameni, P. van Den Bos, A. Rensink, J. Tretmans. An Intermediate Language to Integrate Behavior-Driven Development Scenarios and Model-Based Testing Accepted at VST 2024.