UTFacultiesEEMCSEventsPhD defence Federico Mazzone | Fully Homomorphic Encryption for Privacy-Preserving Collaborative Machine Learning

PhD defence Federico Mazzone | Fully Homomorphic Encryption for Privacy-Preserving Collaborative Machine Learning

Fully Homomorphic Encryption for Privacy-Preserving Collaborative Machine Learning

The PhD defence of Federico Mazzone will take place in the Waaier building of the University of Twente and can be followed by a live stream.
Live Stream

Federico Mazzone is a PhD student in the department Semantics, Cybersecurity & Services. (Co)Promotors are prof.dr. G. Guizzardi; prof.dr. A. Peter and dr.ing. F.W. Hahn and dr. M.H. Everts from the faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente.

The increasing need for machine learning (ML) models to train on more diverse and extensive datasets has led multiple data-owning entities such as hospitals, insurance companies, and municipalities to seek ways to collaboratively train ML models without compromising the privacy of their training data. Architectural solutions like Federated Learning (FL) are usually combined with cryptographic techniques such as Secure Multi-Party Computation (MPC), Fully Homomorphic Encryption (FHE), Trusted Execution Environments (TEEs), or Functional Encryption (FE) to enable secure collaborative training while preventing any information leakage. Among these cryptographic approaches, FHE stands out as a promising solution due to its efficient scalability as the number of parties increases while not requiring any non-collusion assumptions in the security model. However, its high computational overhead makes it impractical for real-world applications.

This work aims to explore, improve, and design more efficient FHE-based algorithms for privacy-preserving collaborative learning. The main computational bottleneck in FHE-based ML comes from the homomorphic evaluation of non-polynomial functions, which are common in ML (e.g., comparisons in data analysis, activation functions in neural networks). To improve efficiency, we focus on minimizing the number of such costly evaluations through two main strategies:

·       by redesigning algorithms that repeatedly invoke non-polynomial functions, minimizing their occurrence while preserving functionality, and

·       by identifying computations that do not require encryption to maintain the desired level of privacy and execute them in plaintext.

Concretely, for the first strategy we propose new techniques for performing fundamental operations under encryption using the CKKS FHE scheme, particularly those relying on the comparison function (a non-polynomial operation). By re-encoding encrypted input vectors and leveraging the SIMD capabilities of CKKS, we efficiently compute functionalities such as ranking, argmin/argmax, median, and sorting with a single invocation of the comparison function. Some of these operations have direct applications in ML, including max pooling in convolutional networks and argmax computations for output layers. As a case study, we demonstrate how our optimized argmin operation enables efficient privacy-preserving clustering in scenarios where data is vertically partitioned across the training parties.

As for the second strategy, we explore supervised deep learning, where cryptographic approaches can prevent attacks during training, but cannot inherently protect against black-box attacks that extract information through query interactions during inference. We argue that providing absolute security during training is unnecessary if the model is eventually made available for inference. Instead, we identify the layers that leak the most information in neural networks and selectively encrypt only those, ensuring a level of privacy consistent with inference-phase security. To further mitigate black-box attacks, we investigate complementary techniques like knowledge distillation to strengthen privacy across the whole ML pipeline.

Our work takes a step toward making FHE-based privacy-preserving ML more practical for real-world deployment, balancing strong privacy guarantees with computational efficiency.