Machine Learning | Topics and Proposals

For background information on the topic as a whole, scroll to the end of this page.

Available Project Proposals

If you are interested in the general topic of Machine Learning, or if have your own project idea related to the topic, please contact us directly. Alternatively, you can also work on one of the following concrete project proposals:

Point cloud classification, segmentation, and visulization (Faizan Ahmed)

Supervisors: Faizan Ahmed

Point clouds are sets of spatial data points captured by 3D scanning techniques such as lidar. These point clouds contain many million points of data, resulting in 3D representations of the railway environment. Point cloud data can be used to create machine learning models that can classify the object in rail infrastructure automatically.

The main research question is: How to reliably segment, classify, and visualize point clouds for railway catenary systems with scant computational power, memory, and ground truth labels?

However, the student can focus on a (combination of) topic(s) given below:

Create an end-to-end pipeline for point cloud visualisation using existing libraries
visualizing model output(or intermediate processing) to explore interpretability/explainability
application of XAI techniques to point clouds
Leverage existing trained models for point cloud labelling
pre-process (down-sampling, Augmentation, filtering, voxelization etc.) point clouds for accurate model training
train deep learning models for point cloud segmentation and object detection

Some previous assignments related to this topic:

Validation of Data Preprocessing Pipeline with Model-Based Testing (Petra van den Bos, Maurice van Keulen)

In Data Science, much effort goes into the implementation of a data preprocessing and model training pipeline. It is important that the resulting system is validated on many more aspects than just the predictive power of the resulting model. One aspect that is in dire need of additional validation is the data preprocessing part. It should produce data of sufficient quality for model training and testing. The main question in this research project is: how to effectively validate a data preprocessing pipeline?

Some of this validation should already be done before all implementation has been completed, i.e., by specifying of the preprocessing pipeline on a high level, and checking whether basic properties hold. After implementation, the specification can be used for model-based testing. Here the high-level specification models what the pipeline is expected to do. Specifically, in model-based testing the two following steps are taken: (i) a test generation algorithm is applied on the specification to automatically derive tests, and (ii) these tests can then be executed on the implementation of the pipeline to check that it functions as specified.

The idea for the project is to establish a high-level modeling language for data preprocessing pipelines, and an approach, based on Model-Based Testing, that can then validate this data preprocessing pipeline.

Example: Imagine a data science project developing a machine learning model for predicting the duration of heart surgeries for the purpose of improving the planning of surgeries. The developed predictive model is meant to predict the duration of a surgery given data on the characteristics of the patient, their condition, the type of surgery, the staff involved, and contextual factors. The data preprocessing will include many cleaning actions of individual attributes, deletion of rows with too many missing values, derive some other attributes, etc. One could imagine making a high-level specification of these steps which can then be validated on properties like completeness (is still a minimum of 80% of the data left?) and bias (are still certain subsets of surgeries represented in the data with roughly the same ratios?)

Contact

Background

Machine learning helps to learn from enormous amounts of data gathered as a direct or indirect product of a process, whether it be an industrial product line, efficient farming for food production, prediction of bacterial growth, or any other industrial, healthcare-related, or technological endeavor. Similar to humans, machine learning extracts patterns from data if enough examples are provided. Reinforcement learning is an action-oriented type of learning in which an agent tries to maximise a given reward signal, without being told explicitly which actions to take. This type of learning is often combined with other types of machine learning. For instance, deep learning is widely applied in scenarios with large state spaces.

The topic of machine learning leads to projects of varying nature such as theoretical (algorithm development, computational efficiency, and convergence of machine learning algorithm, formal representation), algorithm training for classification or regression (see some example projects below), predictive maintenance, and reinforcement learning.

Related Modules

Intelligent interaction design