UTFacultiesEEMCSEventsPhD Defence Hongwei Wen | Statistical Machine Learning Beyond Standard Supervised Learning

PhD Defence Hongwei Wen | Statistical Machine Learning Beyond Standard Supervised Learning

Statistical Machine Learning Beyond Standard Supervised Learning

The PhD defence of Hongwei Wen will take place in the Waaier building of the University of Twente and can be followed by a live stream
Live Stream

Hongwei Wen is a PhD student in the departmentĀ Mathematics of Operations Research. (Co)Promotors are prof.dr. A.J. Schmidt-Hieber; prof.dr. W.M. Koolen and dr. A. Betken from the faculty Electrical Engineering, Mathematics and Computer Science, University of Twente.

Machine learning has revolutionized scientific research, industry, and society, yet many real-world problems fall outside the scope of traditional supervised learning, which assumes abundant labeled data and stable distributions. In practice, challenges such as distribution shifts, incomplete supervision, and noisy or missing labels demand new theoretical frameworks and algorithms.

This thesis addresses four statistical learning problems beyond standard supervised learning, aiming to develop generalizable, robust methods with solid theoretical guarantees.

First, it investigates label shift in transfer learning, where label distributions differ between training and deployment domains. A novel class probability matching (CPM) framework is introduced to estimate target label distributions by aligning class probabilities. CPM is combined with calibrated neural networks and kernel logistic regression, with both algorithms supported by theory and experiments.

Second, it studies partial label learning, where labels are ambiguous. A new family of leveraged weighted (LW) losses is proposed, introducing a leverage parameter to balance losses on partially and fully observed labels. Risk consistency is established, and the approach shows strong empirical results.

Third, it tackles robust kernel regression under heavy-tailed noise via a generalized Cauchy noise assumption. The work proves an equivalence between excess Cauchy risk and L_2-risk for suitable parameters, and achieves almost minimax-optimal rates for kernel Cauchy ridge regression, demonstrating robustness to diverse noise types.

Finally, it develops high-dimensional density estimation for unsupervised learning through random forest density estimation (RFDE). RFDE is locally adaptive, computationally efficient, and outlier-robust via the median-of-means technique, with provably lower error than single-tree methods.

Collectively, these contributions extend the foundations of statistical learning, offering theoretically sound and practically effective tools for complex, real-world machine learning scenarios.