Seminar Series on the Mathematics of Data Science - Department of Applied Mathematics
With the MDS Seminar, we would like to launch a lecture series in which both researchers from the University of Twente and external researchers present their current work in the field of mathematics of data science. The aim is to get to know and understand the research of other groups and disciplines better. It offers the opportunity for regular exchange as well as a basis for possible collaborations.
Format
Seminars are held on campus and via Teams. All seminars occur every fortnight on Mondays at 4 p.m. unless otherwise stated (see the program below for the dates and the rooms).
Upcoming seminars
16 February 2026, 16:00 (RA 2334)
- Speaker: Fabian Mies (TU Delft)
Title: Asymptotics of multiscale scan statistics
Abstract: Optimal rates in nonparametric testing problems can be achieved adaptively via so-called multiscale methods, which resolve the need for bandwidth selection. These approaches have recently received renewed interest in the context of optimal changepoint localization in time series. In this talk, I provide two perspectives on multiscale procedures by viewing them as a specific multiple testing scheme, and by drawing a connection to Hölderian path properties of Brownian motion. In both cases, inference for non-Gaussian data is enabled by a new limit theory to establish the thresholded weak convergence of certain test statistics. Methodological implications for changepoint inference and goodness-of-fit testing are discussed. Lastly, I explain how statistical power can be improved in certain regimes by replacing the supremum-type multiscale statistic by an Orlicz-type aggregation of local information.
02 March 2026, 11:00 (CR 2L)
- Speaker: Ivo Stoepker (TU/e)
Title: Inference with Sequential Monte-Carlo Computation of p-values: Fast and Valid Approaches
Abstract: Hypothesis tests calibrated by (re)sampling methods (such as permutation, rank and bootstrap tests) are useful tools for statistical analysis, at the computational cost of requiring Monte Carlo sampling for calibration. It is common practice to execute such tests with a predetermined and large number of Monte Carlo samples, and disregard randomness resulting from this sampling when drawing inference. But what is an appropriate number of Monte Carlo samples? In practice, the choice is typically informed by computational constraints, and based on common default choices. At best, this practice results in computational inefficiency; at worst, it leads to invalid inference. To combat this, a number of approaches have been proposed in the literature, aimed at adaptively guiding analysts in choosing the number of Monte Carlo samples, for example by sequentially deciding when to stop collecting samples and draw inference. These works introduce competing notions of what constitutes "valid" inference, complicating the landscape for analysts seeking suitable methodology. In this talk, I build bridges between these scattered validity notions. I then introduce our new practical sequential methodology, which updates the p-value estimate after each new Monte Carlo sample has been drawn, while retaining important validity notions regardless of the time when Monte Carlo simulation is terminated.
This talk is based on the following joint work with Rui Castro: Inference with Sequential Monte-Carlo Computation of p-values: Fast and Valid Approaches, Statistical Science, in press. The preprint is available here: https://arxiv.org/abs/2409.18908. A companion R package “avseqmc” is available on CRAN: https://CRAN.R-project.org/package=avseqmc.
09 MARCH 2026, 16:00 (CR 3C)
- Speaker: Chenguang Duan (RWTH Aachen)
Title: Theoretical understanding of distillation and fine-tuning of generative models
Abstract:
In the first part of this talk, we introduce the characteristic generator, a novel one-step generative model that combines the sampling efficiency of Generative Adversarial Networks (GANs) with the stability of flow-based models. The model is driven by characteristic curves, along which probability density transport is governed by ordinary differential equations (ODEs). We present a comprehensive theoretical analysis of errors arising from velocity matching, Euler discretization, and characteristic fitting, and establish a non-asymptotic convergence rate in the 2-Wasserstein distance under mild assumptions on the data. Crucially, under a standard manifold assumption, we show that this convergence rate depends only on the intrinsic dimension of the data rather than the ambient dimension, thereby demonstrating that the proposed model mitigates the curse of dimensionality.
In the second part of the talk, we turn to inference-time alignment for diffusion models. Our goal is to adapt a pre-trained diffusion model to a target distribution without retraining the base score network, thus preserving its generative capacity while enforcing desired properties at inference time. To this end, we introduce Doob’s matching, a new framework for guidance estimation grounded in Doob’s $h$-transform. Within this framework, guidance is formulated as the gradient of the logarithm of an underlying Doob's $h$-function. We propose a gradient-penalized regression approach that simultaneously estimates the $h$-function and its gradient, yielding a consistent estimator of the guidance. We establish non-asymptotic convergence guarantees, in the 2-Wasserstein distance, for the distributions generated by the aligned diffusion model. Joint work with Jinyuan Chang, Zhao Ding, Yuling Jiao, Ruoxuan Li, Jerry Zhijian Yang, Yi Xu, and Pingwen Zhang.
16 MaRch 2026, 16:00 (RA 2504)
- Speaker: Michael R.A. Abdelmalik (TU/e)
Title: T.b.a.
11 May 2026, 16:00 (T.b.A)
- Speaker: Tom Jacobs (CISPA)
Title: T.b.a.
18 May 2026, 11:00 (T.b.A)
- Speaker: Serte Donderwinkel (RUG)
Title: T.b.a.
28 May 2026, 16:00 (T.b.A)
- Speaker: Yongdai Kim (Seoul National University)
Title: T.b.a.