HomeEventsPhD Defence Ardjan Zwartjes

PhD Defence Ardjan Zwartjes

adaptive naive bayes classification for wireless sensor networks

Wireless Sensor Networks (WSNs) are networks of tiny devices equipped with sensors and wireless communication to observe an environment and to communicate about these observations. For some applications the observations themselves are the goal, all sampled data needs to be stored or transmitted to a central place in the network that can offload the data, for example over the internet. For other applications, such as fire-detection and cold-chain quality control, the raw observations are not critical, however, the detection of events on the network is (e.g. the house is on fire). For this type of applications Machine Learning techniques are of interest. In a typical WSN Machine Learning algorithms can be trained to derive the occurrence of an event from the abundance of data that can be observed, therefore it is not necessary to write code for all relevant conditions and to perform complex calibration. Historical data can be used to train the Machine Learning approach.


WSNs are a complex environment for application development. Many aspects that are critical for WSN applications have little relevance in more common computing environments. These aspects include: distributed computations, energy constraints, strict memory limitations, dynamic network topologies, complexity of deployment and physical inaccessibility of hardware. All of these factors make careful selection of algorithms and a suitable application architecture critical. However, most Machine Learning research was not conducted with these aspects in mind and as such many Machine Learning techniques, in their basic form, are ill suited for WSN applications.

This thesis demonstrates that the Naive Bayes classifier has a number of interesting features for WSN applications, in contrast to Feed Forward Neural Networks and Decision Trees. All three algorithms need a small amount of computational power and require only a small amount of memory. Naive Bayes classifiers, however, can be efficiently distributed, an aspect where Feed Forward Neural Networks are severely limited. Furthermore, Naive Bayes works with meaningful partial results that can be independently combined into a classification result. As a consequence a Naive Bayes classifier trained for a WSN can remain functional even if nodes leave the WSN and can be improved by adding nodes. Both Decision Trees and Feed Forward Neural Networks, on the other hand, have a high dependency on input reliability. Because of these factors the Naive Bayes classifier is a suitable algorithm for WSNs.

A challenge for any Machine Learning application on WSNs is the training of the Machine Learning algorithm. For Naive Bayes supervised training can be applied to gather statistics about the data which can be used for probability estimation. A common approach for the supervised training of the Naive Bayes classifier is the division of the input space of each feature, or partial observation, in a number of intervals. Supervised training can then be applied to gather statistics about the distribution of the classes over those intervals. In order to limit the influence of noise and statistical anomalies it is important that each interval contains a significant number of observations, otherwise small variations in the training set can have a large impact on the classification output. This means that simply dividing the input space in equal portions is not an optimal solution. In this thesis we demonstrate that unsupervised learning can be applied to create a suitable division of the input space. Multiple unsupervised learning algorithms are evaluated: Kohonen maps, K-means, P2 and a custom Self Organising Map. For those approaches, we demonstrate that the P2 algorithm provides the most suitable division of the input space. This thesis demonstrates that the application of unsupervised training allows memory efficient training of very accurate Naive Bayes classifiers.

In order to provide meaningful knowledge about a classification problem Machine Learning algorithms need to be trained by providing examples of the desired classification output. The distributed nature of WSNs and, for some applications, the inaccessible environment in which sensor nodes are deployed make this task far from trivial. One approach is to train a generic classifier under lab conditions, without taking into account the specifics of the location of deployment. In this approach, location specific factors might have a negative impact on classification performance. Deployment specific training is an alternative which can be performed in two ways: online and offline. Online training means that desired classification outputs are transmitted over the network to all the sensor nodes. With these examples, classifiers are trained locally on the sensor nodes. Offline training means transmitting the sampled sensor data to a central location where a classifier is trained for each sensor node. After the training phase these classifiers are transmitted over the network to each node.

Both of these approaches require transmitting a large number of messages, which consumes a lot of energy. To overcome this challenge this thesis introduces QUan- tile Estimation after Supervised Training (QUEST), an adaptive Naive Bayes classifier. For QUEST a generic classifier is trained under lab conditions and deployed to all sensor nodes. However, instead of disregarding the specifics of the location of deployment, QUEST uses local observations and unsupervised training on each sensor node to continuously adapt the classifier to the new environment. This approach removes the communication required for training and has only a very limited effect on classification performance. As such QUEST enables the efficient deployment of a WSN and reduces the manual maintenance required in case of battery depletion.