Unsupervised learning using NLP for recommendation clustering
Type: Bachelor EE/CS/HMI
If you are interested please contact :
Telecommunications networks form a critical infrastructure. Outages due to incidents can have a severe impact on society: emergency services cannot be reached, traffic light installations may stop functioning and internet services can go down.
As part of a larger project researchers of Universiteit Twente have collected recommendations from different parties within a telecommunications operator. These recommendations need to be further assessed by relevant parties, but due to the volume of recommendations (303) and the wide-ranging subjects of the recommendations, this set cannot be offered for review as a whole. They need to be clustered so that each individual reviewer gets a manageable amount.
On top of this practicality, the researchers think that there may be an underlying structure in the recommendations that can teach us something new about how businesses conduct their work and which blind spots exists in the processes that they employ for their day to day business.
To find the structure in these recommendations, we need a clustering algorithm. Given the secondary goal of the researchers, this calls for an unsupervised approach (as we do not yet know the underlying structure). The recommendations are formulated in plain English, which necessitates the application of NLP.
Your assignment is to develop an algorithm that can cluster the recommendations and to deliver the clusters themselves. This assignment has several parts:
- Cleanse the data (several recommendations are repeated. The number of repetitions may say something about the importance of the recommendation)
- Choose a set of applicable clustering algorithm
- Find an optimum number of clusters
- Compare the outcomes of the different algorithms
- Draw conculsions about the effectiveness of the algorithms
- Deliver the clusters of recommendations, optionally (for bonus points) with a generated description of each cluster.