MDS Seminar

Seminar Series on the Mathematics of Data Science - Department of Applied Mathematics

With the MDS Seminar, we would like to launch a lecture series in which both researchers from the University of Twente and external researchers present their current work in the field of mathematics of data science. The aim is to get to know and understand the research of other groups and disciplines better. It offers the opportunity for regular exchange as well as a basis for possible collaborations.

Format
Seminars are held on campus and via Teams. All seminars occur every fortnight on Mondays at 4 p.m. unless otherwise stated (see the program below for the dates and the rooms).

Upcoming seminars

11 May 2026, 16:00 (RA 2504)

Speaker: Tom Jacobs (CISPA)
Title: Controlling Implicit Regularization in Deep Learning via Weight Decay and Mirror Descent
Abstract: Classical learning theory predicts that overparameterized models should overfit, yet deep neural networks generalize well in this regime. A possible explanation for this is implicit regularization: gradient-based optimization biases solutions toward low-complexity structures (e.g., sparsity or low rank) even without explicit constraints, as observed in settings such as matrix sensing and attention models. In this seminar, I show that weight decay controls this bias: beyond its explicit role as L2-regularization, it modifies the optimization geometry (mirror map), effectively shifting the implicit regularization toward L1-type behavior and thereby promoting sparsity. By turning off weight decay during training, only the implicit effect remains, leading to better generalization. Leveraging this perspective, I introduce PILoT (Parametric Implicit Lottery Ticket), a sparsification method that exploits overparameterization and the L2-to-L1 transition in implicit regularization to produce sparse networks with minimal performance degradation. Building on these insights, I further introduce HAM (Hyperbolic Aware Minimization), a lightweight optimization method that captures the sparsity-inducing implicit bias using mirror descent, thereby directly controlling the implicit bias and leading to improved standard training and state-of-the-art performance in finding sparse networks.

18 May 2026, 11:00 (OH 218)

Speaker: Serte Donderwinkel (RUG)
Title: Counting connected graphs (or: how to estimate extremely rare events?)
Abstract: How many connected graphs have a prescribed degree sequence? This classical combinatorial question turns out to admit a surprisingly natural probabilistic interpretation.
In joint work with Sasha Bell and Remco van der Hofstad, we derive asymptotic formulas for the number of connected graphs with a given degree sequence in the sparse regime. Our approach is based on the probabilistic method: rather than counting graphs directly, we study a random graph model in which the desired structures appear with a certain probability.
A major challenge is that connectivity is exponentially unlikely when many vertices have degree 1. Estimating probabilities of such rare events is notoriously difficult. We overcome this by using a probabilistic “change of perspective’’ that turns the rare event into a typical one. Concretely, we construct a larger random graph in which the target degree sequence typically emerges in the giant connected component. This viewpoint not only allows us to estimate exponentially small probabilities, but also reveals the most likely mechanism by which the rare event occurs.
Along the way, I will introduce several probabilistic ideas that have become central in modern network science and random graph theory, including the configuration model, branching process approximations, and local weak convergence, and explain how these tools combine to yield asymptotic counting results.

28 May 2026, 16:00 (RA 2503)

Speaker: Yongdai Kim (Seoul National University)
Title: A Composite Activation Function for Learning Stable Binary Representations
Abstract: Activation functions play a central role in neural networks by shaping internal representations.
Recently, learning binary activation representations has attracted significant attention due to their advantages in computational and memory efficiency, as well as interpretability. However, training neural networks with Heaviside activations remains challenging, as their non-differentiability obstructs standard gradient-based optimization.
In this talk, we propose \textit{Heavy-Tailed Activation Function (HTAF)}, a smooth approximation to the Heaviside function that enables stable training with gradient-based optimization. We construct HTAF as a sigmoid–hyperbolic tangent composite function and theoretically show that it maintains a large gradient mass around zero inputs while exhibiting slower gradient decay in the tail regions. We show empirically that Spiking Neural Networks, Binary Neural Networks and Deep Heaviside neural Networks can be trained stably using HTAF with gradient-based optimization. Finally, we introduce Implicit Concept Bottleneck Models (ICBMs), an interpretable image model that leverages HTAF to induce discrete feature representations. Extensive experiments across various architectures and image datasets demonstrate that ICBM enables stable discretization while achieving prediction performance comparable to or better than standard models.

8 JUNe 2026, 15:15 (RA 2504)

Speaker: Sebastian Kassing (Bergische Universität Wuppertal)
Title: Fast local convergence rates for stochastic gradient methods under the PL-inequality: A geometric approach
Abstract: Gradient methods are among the most widely used algorithms for optimization. Their behavior is well understood when the objective function is strongly convex, but many modern problems fall outside this classical setting. A useful alternative is the Polyak-Lojasiewicz inequality, which can guarantee convergence even without convexity.The simplicity of the PL-inequality allows for quick plug-and-play proofs with comparatively good results. More recently, Boumal and Rebjock classified the geometry of the loss landscapes for functions satisfying the PL-inequality. This allows for geometric approaches leading to faster convergence rates. We follow this approach analyzing the stochastic gradient descent and Polyak's heavy ball method.This is based on joint work with Thomas Kruse (Wuppertal) and Simon Weissmann (Mannheim).

8 JUNe 2026, 16:00 (RA 2504)

Speaker: Kelan Gray (Imperial College London)
Title: Structure-Preserving Koopman Operator Learning
Abstract: Koopman operators are infinite-dimensional operators that globally linearize nonlinear dynamical systems, with their spectral properties encoding essential information about the underlying dynamics. A common approach to recovering such operators from data is to build a Galerkin approximation on a suitable finite-dimensional subspace. However, such approximations face two key challenges: (1) they can corrupt important spectral information through a process known as spectral pollution, and (2) they require judicious selection of a suitable subspace to faithfully capture the system’s behaviour. I will discuss two approaches that have recently emerged to address these issues. First, structure-preserving methods have broken new ground by exploiting more than just the linearity of the Koopman operator, such as multiplicativity, which leads to more faithful spectral approximations. Second, deep neural networks provide a powerful and flexible framework for data-driven subspace selection, circumventing the need for feature engineering. I will also present a new approach that harnesses the representational power of neural networks while preserving the multiplicative structure of the Koopman operator.

15 JUNE 2026, 16:00 (RA 2502)

Speaker: Alexis Derumigny (TU Delft)
Title: T.b.a.

5 October 2026, 12:40 (TBA)

Speaker: Chen Zhou (Erasmus University Rotterdam)
Title: T.b.a.

19 October 2026, 12:40 (TBA)

Speaker: Anas Mourahib (Eindhoven University of Technology)
Title: T.b.a.

30 November 2026, 12:40 (TBA)

Speaker: Nick Koning (Erasmus University Rotterdam)
Title: T.b.a.