Using UT-JupyterLab for your research and teaching has many benefits, says associate professor of Data Science Maurice van Keulen. ‘The platform offers considerable computing power and saves me a lot of time, as many programming languages, such as Python and R, as well as applications have already been installed in it. It now takes less time to set up a practical. We also need fewer student assistants for the practicals, since much explanatory documentation is available on the platform.’a
All UT employees and students can use UT-JupyterLab free of charge. Van Keulen gladly uses UT-JupyterLab for his research and teaching. ‘It is a computer infrastructure that offers a lot of software, much of it open source, and it also includes instructions for it. Everything has been brought together in a single environment. It is a cluster of computers that large numbers of people can use at the same time.’
Cleaning up and transforming data sets
As a data scientist, Van Keulen explains, he spends a lot of time on machine learning. ‘This requires using source code to clean up and transform my data set. Next, I write code which the computer can use to learn, for instance to create a predictive model. Lastly, I write code to evaluate how well my predictive model works.’
Storing and sharing large data sets
‘To do all this I programme a lot and use a lot of software as well as packages built by others. Cleaning up data sets to enable machine learning requires a great amount of computing power. Machine learning also requires special hardware, such as GPU, a special computer ship that speeds up machine learning. UT-JupyterLab is an excellent environment for all this. If I want to learn something new, I can go to work in UT- JupyterLab straightaway and find everything easily. I can also use it to store and share large data sets. In this way it only needs storing once.’
Preparing practicals takes less time
Van Keulen has also found UT-JupyterLab to have several advantages for teaching. ‘We have many courses in which our students learn to clean up and transform data and to make predictive models. In more advanced courses they also use imaging or natural language processing. They learn to work with a great variety of data types. Now that I use UT-JupyterLab in my teaching, I can prepare practicals a lot faster. JupyterLab is also compatible with Python notebooks. This is convenient for lecturers, as one can write code and combine it with textual explanations. This ensures that students learn what they need to learn.’
Faster marking
‘I can refer my students to UT-JupyterLab directly and tell them that this is the place where they will find the documentation. LISA has set it up to be user-friendly and low-threshold: everything has been carefully documented. I can place the data sets and sample code for students to use in the practical in UT-Jupyterlab. When doing this in JupyterLab there’s no need to take into account that some students use Windows while others use Mac. Marking assignments is also faster, as everyone works on the basis of the same principle. I know how everyone has gone about their work. And given that it’s a computer cluster, you can work in it with many students at the same time.’
No more installation trouble
‘In 2013, my colleague Mannes Poel and I set up the Data Science course to teach the basics of Data Science to all UT students irrespective of their studies. This is currently a mandatory course in many studies, and when I teach it I refer students to UT-JupyterLab. In this course we work with a large number of student assistants who provide support in case of questions, such as about installation or software packages. Installation problems have greatly reduced since we use Jupyter, and students can now easily find explanations on the platform. As a result, we need fewer student assistants.’
Satisfied students
Van Keulen has noted students’ satisfaction with JupyterLab. ‘When I first referred students to JupyterLab, one of them wrote in the chat: “Very cool. Everything comes pre-installed, and it’s much faster than working on my laptop.” Previously, students used Google Colab, but they no longer receive free student credits for that. JupyterLab is the perfect alternative!’
According to Maurice, the mathematics degree programmes also use UT-JupyterLab. ‘They want to increase their investment in the platform so that even more students can use it. It would be a good thing for our programme to do the same. I still hesitate to refer all of our students to JupyterLab, as this would require greater capacity. I also see more and more of our PhD researchers using it. The Faculty of Electrical Engineering, Mathematics and Computer Science has a High Performance Cluster shared by various research groups. It was formerly also used to do standard tasks, which isn’t really what it’s intended for. These can now be done in UT-JupyterLab.’
FAIR data management
Does data science education pay attention to data management yet? ‘It does. I pay attention to data governance, explaining among other things what data stewards do. I plan to expand this to include an explanation of FAIR data management. Already, FAIR data management receives greater coverage in the new Bachelor’s module “Data Science and Artificial Intelligence: Seeing through the Hype”.’ Where possible, Van Keulen does ‘open’ research. ‘When making research prototypes, we do what we can to ensure the software is publicly available and well-documented. If people want to compare results to ours, we help them do so. In our field of study, sharing code may be even more important than sharing data.’
‘Healthcare makes extensive use of Artificial Intelligence. One of my PhD researchers worked for some years on creating a data set on breast cancer, using data from ZGT hospital. At the moment we’re trying to make this dataset “open” in the form of a public benchmark. This is tricky in view of the private nature of medical data. Because of this, we’re not making the data public, but other researchers will be able to send their source code to the hospital, where it will be implemented in the research environment. In this way the data remains securely with the hospital. The resultant model and its evaluation are then sent back to the fellow researchers.’
Applying UT-JupyterLab in research and education
In UT-JupyterLab the functionalities of Jupyter notebooks can be combined with various applications. A Jupyter notebook is an open-source web application that can be used to integrate one’s source code, comments, multimedia and visualisations in an interactive document. This helps make data comprehensible quickly. Applications include cleaning up and transforming research data, numerical simulation, statistic modelling, data visualisation, machine learning and data mining.
Learn more about UT-JupyterLab
More detailed information about UT-JupyterLab can be found in UT’s Research support service portal.