Extracting Geometric Features of Buildings from Remote Sensing Images
Wufan Zhao is a PhD student in the department of Earth Observation Science. Supervisors are prof.dr.ir. A. Stein and prof.dr.ir. C. Persello from the faculty of Geo-Information Science and Earth Observation.
Rapidly growing cities and populations are increasingly posing problems for sustainability in city development. To address this, information extraction from remote sensing images is indispensable to assessing the current state of urban areas. Data collection on the urban environment at different levels of spatial detail is greatly aided by remote sensing technology. In urban areas, buildings are the most prominent and distinctive type of man-made structure and physical feature. Extraction of building objects from earth observation data has traditionally relied on manually constructed feature sets and expensive airborne photogrammetry and Light Detection and Ranging (LiDAR) data. Such data, however, do not allow a generic autonomous city modeling procedure. The research described in this dissertation investigates and develops deep learning based methods for geometric building features extraction and modeling from high resolution remote sensing images.
First, an improved Convolutional Neural Networks (CNN) plus Recurrent Neural Networks (RNN) workflow has been developed for extracting accurate vectorized building outlines from remote sensing imagery. The CNN part serves as an image feature extractor, while the RNN part decodes the sequence of polygon vertices. This study upgraded state-of-the-art feature extraction by introducing the global context and boundary refinement blocks and adding the channel and spatial attention modules. Both helped to improve the effectiveness of the detection module. This framework also introduced stacked convolutional Gated Recurrent Units (conv-GRU) that preserve the geometric relationship between vertices and accelerate inference. The workflow was tested on two open source benchmark datasets. Results show that our method significantly improves the regularized building outline delineation results of various shapes in complex scenes. This is a leap forward towards full automation in building outline mapping from remote sensing images.
Second, a Graph Neural Network (GNN) based method was developed that extracts vectorized building rooflines and structures in an end-to-end trainable way. It consists of a Multi-task Learning Module (MLM) designed for geometric primitives extraction and matching, and a GNN based Relation Reasoning Module (RRM) to reconstruct the roofline structure. By introducing global geometric line priors by means of the Hough transform, this method has been able to enhance geometric feature extraction. The residual Graph Convolutional Networks (Res-GCN) proved to be efficient and suitable for tackling the vanishing gradient problem that occurs in GNNs. It resulted into an increased detection accuracy. The method was evaluated on the standard Vectorizing World Building (VWB) dataset and a custom Enschede dataset. The new method provides an increase of 0.6/1.3 and 1.2/2.1 for msAP and FH on two data sets using only half of the training time with respect to comparing methods, which indicates the sufficiency for reconstructing the planar roof structure.
Third, a vision Transformer structure is introduced, called Roof-Former, that optimizes the extraction efficiency and accuracy of the roof structure. This structure comprises an image encoder and edge node initialization, image feature fusion with an enhanced segmentation refinement branch, and edge filtering and structural reasoning module. An enhanced feature pyramid module is added to make the image encoder flexible to multi-scale learning, in the meantime reducing resource consumption during training. In addition, the collaborative segmentation refinement branch ensures the consistency of spatial and topological relations. Experimental results on the VWB and Enschede datasets show that this structure improves the existing state-of-the-art methods.
Finally, an Unsupervised Domain Adaptation (UDA) method for height estimation of monocular remote sensing images was developed that addresses the limited access to normalized Digital Surface Model (nDSM) data. This method consists of an image translation stage and a representation learning stage. Mismatched semantic distribution across domains is addressed by enhancing the generator’s semantic robustness. A multi-task training network enables spatially fine-grained representation learning for height estimation and semantic segmentation. Experimental findings on the ISPRS benchmark and Enschede datasets show that this method has advantages over comparable methods and achieves similar results as supervised learning. This work has a wide range of prospects for applications based on elevation information such as large scale 3D city modeling.
To summarize, this thesis presents the development and application of efficient, robust and scalable deep learning methods for extracting geometric models of buildings from high resolution remote sensing images. Key challenges included polygon prediction directly from raster images and estimating 3D information from 2D images. In this way, the dissertation contributes to obtaining effective solutions for image-based 3D building reconstruction in large-scale and complex application scenes.