Advancing Efficiency in Neural Networks through Sparsity and Feature Selection
Zahra Atashgahi is a PhD student in the department Datamanagement & Biometrics. (Co)Promotors are prof.dr.ir. R.N.J. Veldhuis and dr.ir. D.C. Mocanu from the faculty of Electrical Engineering, Mathematics and Computer Science and prof.dr. M. Pechenizkiy from Eindhoven University of Technology.
Deep neural networks (DNNs) have attracted considerable attention over the last several years due to their promising results in various applications. Nevertheless, their extensive model size and over-parameterization have brought to the forefront a significant challenge—escalating computational costs. Furthermore, these challenges are exacerbated when dealing with high-dimensional data, as the complexity and resource requirements of DNNs increase significantly. Consequently, the utilization of deep learning models proves to be ill-suited for scenarios characterized by constrained computational resources and limited battery life, incurring substantial training and inference costs, both in terms of memory and computational resources.
Sparse neural networks (SNNs) have emerged as a prominent approach toward addressing the over-parameterization inherent in DNNs, thus, mitigating associated costs. By keeping only the most important connections of a DNN, they achieve a comparable result to their dense counterpart network but with significantly fewer parameters. However, most current solutions to reduce computation costs using SNNs mainly gain inference efficiency, while being resource-intensive during training. Furthermore, these solutions predominantly center their efforts on a restricted set of application domains, particularly within the realms of vision and language tasks.
This Ph.D. research aims to address these challenges by introducing Cost-effective Artificial Neural Networks (CeANNs) designed to achieve a targeted performance across diverse complex machine learning tasks while demanding minimal computational, memory, and energy resources during both network training and inference. Our study on CeANNs includes two primary perspectives: model and data efficiency. In essence, we leverage the potential of SNNs to reduce the model parameters and data dimensionality, thereby facilitating efficient training and deployment of artificial neural networks. This work results in the development of artificial neural networks that are more practical and accessible for real-world applications, with a key emphasis on cost-effectiveness. Within this thesis, we delve into our developed methodologies aimed at advancing efficiency. Our contributions can be summarized as follows:
Part I. Advancing Training and Inference Efficiency of DNNs through Sparsity.
This part of the thesis focuses on enhancing the model efficiency of DNNs through sparsity. The inherent high computational cost associated with DNNs, primarily stemming from their large, over-parameterized layers, highlights the need for computationally-aware design in both model architecture and training methods. Within Part I of this thesis, we leverage sparsity to address this challenge, with a specific focus on achieving a targeted performance in extremely sparse neural networks and efficient time series analysis with DNNs. We propose two algorithms to tackle these issues: a dynamic sparse training (DST) algorithm for learning in extremely sparse neural networks (Chapter 2) and a methodology for obtaining SNNs for time series prediction (Chapter 3). In essence, our goal is to enhance the training and inference efficiency of DNNs through sparsity while focusing on addressing specific challenges in underexplored application domains, particularly in tabular and time series data analysis.
Part II. Leveraging Feature Selection for Efficient Model Development. In the pursuit of cost-effective artificial neural networks, it is crucial to address the challenges associated with high-dimensional input data due to its potential to hinder scalability and introduce issues such as the curse of dimensionality and over-fitting. One promising avenue to tackle these challenges is feature selection, a technique designed to identify the most relevant and informative attributes of a dataset. However, existing feature selection methods are mostly computationally expensive, especially when dealing with high-dimensional datasets or those with a substantial sample size. To address this issue, in the second part of the thesis, we propose for the first time to exploit SNNs to perform efficient feature selection. We present our two proposed feature selection methods, one for unsupervised feature selection (Chapter 4) and another for supervised feature selection (Chapter 5). These methods are specifically designed to offer effective solutions to the challenges of high dimensionality while maintaining computational efficiency. As we show in Chapter 5, by using less than 10% of the parameters of the dense network, our proposed method achieves the highest ranking-based score in terms of finding qualitative features among the state-of-the-art feature selection methods. The combination of feature selection and neural networks offers a powerful strategy, enhancing the training process and performing dimensionality reduction, thereby advancing the overall efficiency of model development.
In conclusion, this research focuses on the development of cost-effective artificial neural networks that deliver targeted performance while minimizing computational, memory, and energy resources. The research explores CeANNs from two perspectives: model efficiency and data efficiency. The first part of the thesis addresses model efficiency through sparsity, proposing algorithms for efficient training and inference of DNNs for various data types. The second part of the thesis leverages SNNs to efficiently select an informative subset of attributes from high-dimensional input data. By considering both model and data efficiency, the aim is to develop CeANNs that are practical and accessible for real-world applications. In Chapter 6, we present the preliminary impact and the limitations of the work and potential directions for future research in the field. We hope that this Ph.D. thesis will pave the way to designing cost-effective artificial neural networks.