Methods for selective sweep detection using Convolutional Neural Networks
Hanqing Zhao is a PhD student in the Department of Computer Architecture Design and Test for Embedded Systems. (Co)Promotors are prof.dr.ir. M.J.G. Bekooij and dr.ir. N. Alachiotis from the Faculty of Electrical Engineering, Mathematics and Computer Science.
Positive selection is a key concept in evolutionary biology and population genetics, describing how advantageous genetic variants increase in frequency due to their beneficial effects on fitness. This process shapes genetic diversity and drives adaptation, allowing populations to respond to environmental and ecological changes. When a beneficial allele becomes fixed, nearby genetic variation decreases, a phenomenon known as a selective sweep. Detecting such sweeps relies on identifying their characteristic genomic signatures.
This thesis presents a deep learning framework for genome-wide selective sweep detection. First, a lightweight convolutional neural network (CNN), SweepNet, is introduced. SweepNet achieves higher training efficiency than existing CNNs in population genetics and remains robust across diverse demographic models. With fewer trainable parameters and a constant network size regardless of sample size or the number of Single Nucleotide Polymorphisms (SNPs), it offers computational efficiency without sacrificing accuracy.
Building on SweepNet, a scalable CNN classifier, FAST-NN, is developed to further improve efficiency and accuracy. FAST-NN is a summary statistic free method that learns to detect two types of selective sweep signatures using one-dimensional input vectors derived directly from raw genomic data, enabling scalability to large datasets. To enhance classification performance, several data rearrangement algorithms are proposed to reorder genomic matrices based on informatics and genetic principles, improving both interpretability and accuracy.
Finally, a CNN-based framework, ASDEC, is introduced as the first deep learning tool capable of scanning entire genomes, localizing targets of selection, and estimating the extent of selective sweeps. ASDEC also provides a foundation for neural architecture search applications in population genomics.
Collectively, the methods developed in this thesis establish a scalable, efficient, and accurate framework for selective sweep detection, offering new opportunities for analyzing real-world genomic datasets and advancing computational approaches to evolutionary inference.



