HomeEducationDoctorate (PhD & EngD)For current candidatesPhD infoUpcoming public defencesPhd Defence Kun Li | Interactive Vision-Language Understanding: From Question Answering to Guided Segmentation

Phd Defence Kun Li | Interactive Vision-Language Understanding: From Question Answering to Guided Segmentation

Interactive Vision-Language Understanding: From Question Answering to Guided Segmentation

The PhD defence of Kun Li will take place in the Waaier building of the University of Twente and can be followed by a live stream
Live stream

Kun Li is a PhD student in the Department of Earth Observation Science. (Co)Promotors are prof.dr.ir. M.G. Vosselman from the Faculty of Geo-Information Science and Earth Observation, University of Twente and prof. M. Yang from the University of Barth.

Interactive vision-language understanding is a crucial and active research field in computer vision, which is highly beneficial for numerous applications in photogrammetry and remote sensing. It focuses on interactively responding to users' queries for different objectives using various forms of guidance, including visual question answering and guided image segmentation. Motivated by the outstanding performance of deep learning algorithms on multimodal tasks, this Ph.D. thesis investigates interactive vision-language understanding of multimodal data using deep learning techniques. This thesis begins by exploring the construction of a benchmark for aerial image visual question answering (Chapter 2). The explainability of answering models is then studied to present the models' reasoning processes (Chapter 3). This work then focuses on leveraging human prompts to train deep-learning models for image segmentation. Specifically, it investigates interactive image segmentation (Chapter 4) to enable interactions between users and machines, as well as referring image segmentation (Chapter 5), which utilizes text to predict pixel-wise masks.