Student project proposal ID: MP-HW-P-01 Level: MSc Title: Model Partitioning and Deployment on FPGA Platforms - Latency Improvement Contact: dr. ir. Uraz Odyurt dr. Amirreza Yousefzadeh prof. dr. ir. Ana-Lucia Varbanescu ========================================================================================== Description Applications of Machine Learning (ML) models are increasing on a daily basis. While this is the case in the industry, scientific applications do not deploy ML-assisted solutions as ubiquitously. However, major experiments are catching up! As an example and as our point of interest, numerous ML-assisted solutions are being rolled out for deployment in the upcoming High-Luminosity Large Hadron Collider (HL-LHC) upgrade. This is an absolute necessity, as the foreseen scale of data collection will crush the data processing limits of the current state-of-the-art. While the data collection and processing at LHC has many distinct layers, the vital task of subatomic particle trajectory reconstruction (tracking) is one of many to utilise ML. As a result of data-intensity, tracking is considered to be a postmortem task. Utilising specialised hardware and accelerators targeting ML deployment could turn it into an online or a pseudo-online task. When it comes to accelerators, there are a few options, e.g., GPUs, FPGAs and neuromorphic hardware. We strive to study the caveats of deploying partitioned ML models on FPGAs as the target platform. The ML models considered for this project are based on the Transformer architecture and could easily end up being too large for one device. Another motivation for partitioning and deployment on multiple FPGAs could be improved latency. The main application of these models as stated above, is tracking in LHC particle collision events. While FPGAs are incorporated on parts of LHC data processing where low-latency is demanded, other algorithms, such as particle track reconstruction could seriously benefit from similar implementations. ========================================================================================== Task The biggest challenge for FPGA deployment is the actual fitting of the model on the available hardware. This becomes more complicated when it is known a priori that the model is large and has to be partitioned. While FPGA deployment has been tried in literature and there are examples for it, we deal with Transformers, a relatively new architecture. This could mean that the research will be purely exploratory. Initially, the student shall study the state-of-the-art [1, 2] and the available tooling, e.g., hls4ml [3]. A clear workflow with as much automation as possible is expected, alongside performance benchmarking. One or more Transformer models, alongside relevant data sets shall be provided from our previous work [4]. A FPGA platform of type Zynq UltraScale+ MPSoC will also be available for experimentation. ========================================================================================== Application The project is part of an ongoing effort to train, test and deploy ML models for particle track reconstruction for the HL-LHC at CERN, which will drastically increase the scale and frequency of data generation. ========================================================================================== References [1] Alonso, 2021, Elastic-DF: Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning. URL: https://doi.org/10.1145/3470567 [2] Nechi, 2023, FPGA-based Deep Learning Inference Accelerators: Where Are We Standing? URL: https://doi.org/10.1145/3613963 [3] Duarte, 2018, Fast inference of deep neural networks in FPGAs for particle physics. URL: https://doi.org/10.1088/1748-0221/13/07/P07027 [4] Caron, 2024, TrackFormers: In Search of Transformer-Based Particle Tracking for the High-Luminosity LHC Era. URL: https://doi.org/10.48550/arXiv.2407.07179 ==========================================================================================