Interactive Vision-Language Understanding: From Question Answering to Guided Segmentation
Kun Li is a PhD student in the Department of Earth Observation Science. (Co)Promotors are prof.dr.ir. M.G. Vosselman from the Faculty of Geo-Information Science and Earth Observation, University of Twente and prof. M. Yang from the University of Barth.
Interactive vision-language understanding is a crucial and active research field in computer vision, which is highly beneficial for numerous applications in photogrammetry and remote sensing. It focuses on interactively responding to users' queries for different objectives using various forms of guidance, including visual question answering and guided image segmentation. Motivated by the outstanding performance of deep learning algorithms on multimodal tasks, this Ph.D. thesis investigates interactive vision-language understanding of multimodal data using deep learning techniques. This thesis begins by exploring the construction of a benchmark for aerial image visual question answering (Chapter 2). The explainability of answering models is then studied to present the models' reasoning processes (Chapter 3). This work then focuses on leveraging human prompts to train deep-learning models for image segmentation. Specifically, it investigates interactive image segmentation (Chapter 4) to enable interactions between users and machines, as well as referring image segmentation (Chapter 5), which utilizes text to predict pixel-wise masks.
More events
Fri 23 Jan 2026 10:30 - 11:30PhD Defence Mark Wiersma | Knowledge Transfer between universities of applied sciences and SMES: A study of innovation spaces
Fri 23 Jan 2026 12:30 - 13:30PhD Defence Ernest Akyereko | Towards an Early Warning System for Pandemics: The Case of COVID-19 in Ghana
Fri 23 Jan 2026 14:30 - 15:30PhD Defence Hilbert Keestra | Multi-scale modeling of sustainable methanol and ammonia synthesis
Thu 29 Jan 2026 14:30 - 15:30PhD defence Deniece Nazareth | Emotional Experiences and Expressions in Reminiscence | A Multimodal Analysis of Emotional Memories in Older Adults
Fri 30 Jan 2026 10:30 - 11:30PhD Defence Işıl Baysal Erez | Handling Missing Data with Meta-Learning and Large Language Models