The infrastructures designed at Google to manage huge amounts of data on large clusters of commodity machines, and the availability of open source counterparts like Hadoop developed at amongst others Yahoo, have changed the way we do research at the Database Group of the University of Twente. Several researchers use our Hadoop cluster, or the Hadoop cluster at the Dutch super computer center SURFsara, to perform large scale analysis of data. In the master course "Managing Big Data", we teach our students to become the "data scientists" of the future.
Interestingly, the course covers many core computer science topics: It discusses file systems (Google File System), programming paradigms (MapReduce), programming languages and query languages (for instance Pig Latin), and new database paradigms (for instance BigTable) all related to the management of Big Data. In the course, the students perform small assignments, and one big challenge. This year, the students showed what they could do with 6 billion web pages provided by CommonCrawl, which resulted in a prize awarded by Peter Norvig, director of research at Google, and one of the world's most renowned scientists and educators in computer science.