Master thesis

Human Factors



In a recent study, we compared the relevance of different measures derived from the EEG to measure the vigilant state of individuals. With these measures, the major idea was to determine what analysis method is most effective in predicting lapses of attention, which in for example driving conditions may lead to serious accidents. We employed ERPs, Fourier analyses, and ERD/ERS. The employed research paradigm, however, may not have been the most effective. Goal of the MA-project is to develop an improved paradigm, which might simply imply that more non-target stimuli are presented, and which may enable to use the recently developed LPS method (Van der Lubbe & Utzerath, 2013).



Introduction. The core of the currently used instruction method for driving lessons has been unchanged in the Netherlands for many years. This method uses a retrospective approach to instruct driving students. First, the driving instructor will let the student experience a driving situation. For example a busy intersection where he needs to detect, and stop for a crossing cyclist. Then, after the situation has occurred, the instructor will reflect back on the situation with the student. He will walk through the situation step by step, and let the student reflect on his or her actions. Driving instructors (and their students) have experienced that this approach can sometimes be problematic. This is mainly in areas where highly complex situations follow each other in rapid order, for example large cities. Before the instructor even has the chance to address the experienced situation, another complex situation is already presenting itself. This can cause stress and very high workload in the students, hindering the performance and learning process.

Recently, a new method had entered the driving school market. Driving school Brunen in Enschede is the first to use this new method among a few of their instructors. This method anticipates on the upcoming traffic situation and guides the student towards it. The instructor furthermore approaches the instructions from the student perspective, giving them the opportunity to internalize the information faster. Although the method has theoretically been supported, it is unclear what the real world effects on the students are.

Aim. The aim of this study is to evaluate a new instructional method for driving schools and compare it to the traditional method. The study takes a student centered approach and looks at the effects of the method on stress and workload during lessons and the effect on learning efficiency.

Your role: In this project, you will set up an evaluation of the traditional and new instruction methods. This may include for example an in-car observation study, and/or structured interviews with the students. As the driving school Brunen is the first to use this method, you will be working closely with them and their instructors to set up your study. After obtaining the relevant data, you will analyze the data and write a report on the evaluation. This study will be carried out at the UT.



Aim and Description:

If we present to a user an AI system that explains how it works, how do we know whether the explanation works and the user has achieved a pragmatic understanding of the AI? In other words, how do we know that an explainable AI system (XAI) is any good? Our focus in this project is on the key concepts of measurement. To this end, we will build upon and further extend the recent work by Hoffman et al. (2018) on developing metrics for explainable AI. We will further develop and test the questionnaires for explanation goodness and satisfaction. Next, we will develop metrics for measuring the user’s understanding of an AI system, i.e., their mental models of the AI system. We will look at which mental model elicitation task aligns with measures of performance, so that such measures of performance might be used as a surrogate for mental model analysis.

Possible research questions:

What is the validity of the Explanation Goodness Checklist and the Explanation Satisfaction Scale developed by Hoffman et al. (2018)?

What metrics for measuring mental models of AI systems are the most efficient?

What metrics for measuring mental models of AI systems align with relevant performance measures?

How can we best measure trust and curiosity as process measures?

What are the most relevant performance measures in an XAI context?


The student will initially have to acquire a good grasp of the various metrics already developed in the context of XAI, as well as in the more general literature on mental models and trust in automated systems. In order to scope the research, the paper by Hoffman et al. (2018) will serve as a starting point. The metrics already developed by these researchers will be further tested in the context of a specific case in which XAI will be tested.

An experiment will be carried out in which good and poor explanations of various software, algorithms, or tools will be presented to participants. Participants’ satisfaction with these explanations will be tested using the Explanation Satisfaction Scale developed by Hoffman et al. (2018). Also, their mental model accuracy of the software, algorithm, or tool will be assessed using a range of mental model elicitation tasks. Next, measures for measuring trust and curiosity will be developed and tested. Finally, participants’ performance when using the software, algorithm, or tool needs to be assessed and correlated with their mental model accuracy.

The challenge in this project is to come up with a suitable AI tool or algorithm that may be used with naïve participants. Alternatively, explanations of systems that participants are familiar with may be used.


Hoffman, R.R., Mueller, S.T., Klein, G., & Litman, J. (2018). Metrics for Explainable AI: Challenges and prospects. arXiv preprint arXiv:1812.04608 Retrieved from

Miller, T. (2017). Explanation in Artificial Intelligence: Insights from the Social Sciences. ArXiv:1706.07269 [Cs]. Retrieved from



The SpeakCount App

The Challenge

SpeakCount is an application that assists self-moderated teams to increase their productivity during meetings. The developers of this app hope that SpeakCount can facilitate Engaging, Productive, Invigorative and Creative (E.P.I.C.) meetings. But can it really work as expected? Anything worth being added up? Now it’s your challenge to test it with experiments.

Background information

The application involves the concept of participatory sense-making[8] as the teams are requested to go through a process for which “meaning is generated and transformed in the interplay between the unfolding interaction process and the individuals engaged in it”.

More specifically: SpeakCount consists of an interactive puzzle with pieces in different shapes that are projected on a surface visible to all team members and an application installed on each member’s smartphone. The application is meant to be used during a team’s self-moderated meetings and it has the ability to track the voice of the smartphone users and count the time that each of them is talking during a meeting. It also provides a button for the individuals to evaluate positively each other’s contribution during a meeting. The meeting has to be initiated through a website and all the participants should be connected to it via their smartphones’ app. Once the meeting starts, the team screen appears an empty puzzle and each user receives 5 of the puzzle pieces on their smartphone interface.

During the meeting, the algorithm calculates the current percentage of each person’s participation in the overall meeting and automatically fills in th e corresponding puzzle pieces from the user’s smartphone. At the same time, whenever two or more users tap (like) the button on their smartphones when a third person is talking, an extra puzzle piece appears to the team puzzle screen. At the end of the meeting, the result is either a complete or a partly complete puzzle and the users can interpret the result on their own.

More information:

Research Question

The main research objective is to determine the effectiveness of the app. Up to which extent does the app improve the productivity of self-moderated meetings and increase the satisfaction of the participants regarding a meeting?

Two dimensions could be explored within the scope of the research:

  • Different kinds of meetings: For which type of meetings can the application be beneficial (brainstorming, decision making…)?
  • Longitudinal effectiveness: What are the effects for the teams that are using often the app in their meetings?

Initiators of SpeakCount:

Syn(+) Ergasia Team

Athina Kapousouz                
Cynthia Chen                        
Georgios Papanikolaou        
Jessica Ren Yueying            

Relative literature

  1. Bersin, J., McDowell, T., Rahnema, A., & Van Durme, Y. (2017). The organization of the future: Arriving now. Global human capital trends 2017: Rewriting the rules for the digital age, 19-28.
  2. Duhigg, C. (2016). What Google learned from its quest to build the perfect team. The New York Times Magazine, 26, 2016.
  3. Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N., & Malone, T. W. (2010). Evidence for a collective intelligence factor in the performance of human groups. science, 330(6004), 686-688.
  4. Edmondson, A. (1999). Psychological safety and learning behavior in work teams. Administrative science quarterly, 44(2), 350-383.
  5. Lehmann-Willenbrock, N., Allen, J. A., & Belyeu, D. (2016). Our love/hate relationship with meetings: Relating good and bad meeting behaviors to meeting outcomes, engagement, and exhaustion. Management Research Review, 39(10), 1293-1312.
  6. Buengeler, C., Klonek, F., Lehmann-Willenbrock, N., Morency, L. P., & Poppe, R. (2017). Killer Apps: Developing Novel Applications That Enhance Team Coordination, Communication, and Effectiveness. Small group research, 48(5), 591-620.
  7. Hamari, J., Koivisto, J., & Sarsa, H. (2014, January). Does gamification work?--a literature review of empirical studies on gamification. In System Sciences (HICSS), 2014 47th Hawaii International Conference on (pp. 3025-3034). IEEE.
  8. Jaegher, Hanne De, and Ezequiel Di Paolo. (2007) "Participatory Sense-making." Phenomenology and the Cognitive Sciences , 6(4) : 485-507.
  9. Fletcher-Watson, S., De Jaegher, H., van Dijk, J., Frauenberger, C., Magnée, M., & Ye, J. (2018). Diversity computing. Interactions, 25(5), 28–33. doi:10.1145/3243461


Studies on working memory load have revealed that an increase in load by using the N-back task leads to diminished performance and an increase in frontal theta activity (measured with the EEG). In the current project, a memory search task originally developed by Sternberg (e.g., see Gerven et al., 2004) will be used with varying loads of the memory set (1,2, or 4 items). The idea is to examine the 3 loads across in total 30 blocks, to examine how decreased vigilance affects behavioral performance, and is reflected in changes in frontal theta activity and also posterior alpha.

The main tasks of this project are to define hypotheses and to set up an experimental procedure to gather data. This includes the development a software to perform the test, and the use a combination of biofeedback techniques to gather data.


  • Van Gerven, P. W., Paas, F., Van Merriënboer, J. J., & Schmidt, H. G. (2004). Memory load and the cognitive pupillary response in aging. Psychophysiology, 41(2), 167-174.
  • Sternberg, S. (1966). High-speed scanning in human memory. Science, 153, 652–654.
  • Sternberg, S. (1967). Retrieval of contextual information from memory. Psychonomic Science, 8, 55–56.


Conversational interfaces and agents (i.e. chatbots and voice interfaces) can be used to support costumer experience with services etc.. However, it is still hard to identify the real application and usefulness of CI to support information retrieval from the end-user point of view. In particular the use of CI is often presented as a way to enable users to interact with Natural Process languages, nevertheless, people (as adaptive agents) could quickly learn how to minimize the wording during the conversation with chatbots by acquiring a basic set of “command lines” to speed up their tasks achievement.

Building upon previous research your exploratory work will focus on how to evaluate voice control interfaces.

Your main tasks will include the literature analysis and the review of previous work to set up experimental procedure to develop initial tools to evaluate the interaction with conversational agents.  


  • Coperich, K., Cudney, E., & Nembhard, H. Continuous Improvement Study of Chatbot Technologies using a Human Factors Methodology.
  • Duijst, D. (2017). Can we Improve the User Experience of Chatbots with Personalisation? MSc Information Studie, Amsterdam.  
  • Følstad, A., & Brandtzæg, P. B. (2017). Chatbots and the new world of HCI. interactions, 24(4), 38-42.
  • Hill, J., Ford, W. R., & Farreras, I. G. (2015). Real conversations with artificial intelligence: A comparison between human–human online conversations and human–chatbot conversations. Computers in Human Behavior, 49, 245-250.
  • Kuligowska, K. (2015). Commercial Chatbot: Performance Evaluation, Usability Metrics and Quality Standards of Embodied Conversational Agents. Browser Download This Paper.
  • Xuetao, M., Bouchet, F., & Sansonnet, J.-P. (2009). Impact of agent’s answers variability on its believability and human-likeness and consequent chatbot improvements. Paper presented at the Proc. of AISB.
  • Kerly, A., Hall, P., & Bull, S. (2007). Bringing chatbots into education: Towards natural language negotiation of open learner models. Knowledge-Based Systems, 20(2), 177-185.
  • Hirschberg, J., & Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261-266.
  • Shawar, B. A., & Atwell, E. (2007). Chatbots: are they really useful?. In LDV Forum (Vol. 22, No. 1, pp. 29-49).
  • McTear, M., Callejas, Z., & Griol, D. (2016). The conversational interface. New York: Springer, 10, 978-3.
  • Agostaro, F., Augello, A., Pilato, G., Vassallo, G., & Gaglio, S. (2005, September). A conversational agent based on a conceptual interpretation of a data driven semantic space. In AI* IA (Vol. 3673, pp. 381-392).

Heller, B., Proctor, M., Mah, D., Jewell, L., & Cheung, B. (2005, June). Freudbot: An investigation of chatbot technology in distance education. In EdMedia: World Conference on Educational Media and Technology (pp. 3913-3918). Association for the Advancement of Computing in Education (AACE).



Every day people use multiple technologies to perform complex tasks, such as buying products online, informing their decision making, or supporting their work activities. Several independent evidences in literature converge on the idea that multiple elements affected people expectations toward the use of a technology, including individual attitudes, skills and capabilities and technology related aspects, such as: product’s aesthetics and usability perceived before the use, fluency, brand and price etc.

In many cases (high risk) processes are dependent on the technology to deliver the appropriate service. It is perhaps reasonable to assume that the implicit agreement of this technology-driven world is that: people trust technology they are using to perform task and decision making in terms of: performance, functionalities and reliability of outcomes. Trust toward technology does not happen immediately but rather is built throughout the relationship between user and artefacts. This is a set of beliefs about a product’s characteristics – i.e., functioning, reliability, safety, etc. And it results from the gained experience of people in the use of different technologies over time. User’s overall trust is, therefore, strongly related to the concept of user experience, i.e., experience with (and the exposition to) different products enable people to develop a set of general attitudes and beliefs toward those technology, including the overall trust.

Building on literature and previous data on explicit measure of trust this exploratory work will attempt to investigate the trust toward new and unknown products by using different methodologies (including but not limited to eye-tracking).

Correlation between implicit and explicit measures will be investigate to model:

  • the process of trust of new and unknown tools;
  • the factors which affect this process;
  • how trust may be bias by design and information presentation.


  • Borsci, S., Lawson, G., Salanitri, D., & Jha, B. (2016). When simulated environments make the difference: the effectiveness of different types of training of car service procedures. Virtual Reality, 20(2), 83-99. doi: 10.1007/s10055-016-0286-8
  • Corbitt, B. J., Thanasankit, T., & Yi, H. (2003). Trust and e-commerce: a study of consumer perceptions. Electronic Commerce Research and Applications, 2(3), 203-215. doi:
  • Fruhling, A. L., & Lee, S. M. (2006). The influence of user interface usability on rural consumers' trust of e-health services. International Journal of Electronic Healthcare, 2(4), 305-321. doi: 10.1504/ijeh.2006.010424
  • Gefen, D. (2000). E-commerce: the role of familiarity and trust. Omega, 28(6), 725-737. doi:
  • Karat, C. M., Brodie, C., Karat, J., Vergo, J., & Alpert, S. R. (2003). Personalizing the user experience on IBM Syst. J., 42(4), 686-701. doi: 10.1147/sj.424.0686
  • Lankton, N. K., & McKnight, D. H. (2011). What does it mean to trust facebook?: examining technology and interpersonal trust beliefs. SIGMIS Database, 42(2), 32-54. doi: 10.1145/1989098.1989101
  • Lawson, G., Salanitri, D., & Waterfield, B. (2016). Future directions for the development of virtual reality within an automotive manufacturer. Applied Ergonomics, 53(Part B), 323-330. doi:
  • Lippert, S. K., & Swiercz, P. M. (2005). Human resource information systems (HRIS) and technology trust. Journal of Information Science, 31(5), 340-353. doi: 10.1177/0165551505055399
  • Marie Christine, R., Olivier, D., & Benoit, A. A. (2001). The impact of interface usability on trust in Web retailers. Internet Research, 11(5), 388-398. doi: 10.1108/10662240110410165
  • Mcknight, D. H., Carter, M., Thatcher, J. B., & Clay, P. F. (2011). Trust in a specific technology: An investigation of its components and measures. ACM Trans. Manage. Inf. Syst., 2(2), 1-25. doi: 10.1145/1985347.1985353
  • Montague, E. N. H., Winchester, W. W., & Kleiner, B. M. (2010). Trust in medical technology by patients and healthcare providers in obstetric work systems. Behaviour & Information Technology, 29(5), 541-554. doi: 10.1080/01449291003752914
  • Pennington, R., Wilcox, H. D., & Grover, V. (2003). The Role of System Trust in Business-to-Consumer Transactions. Journal of Management Information Systems, 20(3), 197-226. doi: 10.1080/07421222.2003.11045777
  • Salanitri, D., Hare, C., Borsci, S., Lawson, G., Sharples, S., & Waterfield, B. (2015). Relationship Between Trust and Usability in Virtual Environments: An Ongoing Study. In M. Kurosu (Ed.), Human-Computer Interaction: Design and Evaluation: 17th International Conference, HCI International 2015, Los Angeles, CA, USA, August 2-7, 2015, Proceedings, Part I (pp. 49-59). Cham: Springer International Publishing.
  • Salanitri, D., Lawson, G., & Waterfield, B. (2016). The Relationship Between Presence and Trust in Virtual Reality. Paper presented at the Proceedings of the European Conference on Cognitive Ergonomics, Nottingham, United Kingdom.
  • Shin, D.-H. (2013). User experience in social commerce: in friends we trust. Behaviour & Information Technology, 32(1), 52-67. doi: 10.1080/0144929x.2012.692167
  • Ziefle, M., Rocker, C., & Holzinger, A. (2011, 18-22 July 2011). Medical Technology in Smart Homes: Exploring the User's Perspective on Privacy, Intimacy and Trust. Paper presented at the 2011 IEEE 35th Annual Computer Software and Applications Conference Workshops.


Dr. Martin Schmettow, Assistant Professor Cognitive Psychology and Ergonomics (CPE), University of Twente,

Marleen Groenier, PhD, Educational Researcher Lab for Professional Learning, TechMed Centre, University of Twente,


Frank R. Halfwerk, MD MSc, Technical Physician in Cardio-thoracic Surgery, Thorax Center Twente, Medisch Spectrum Twente, Enschede,

Prof. Jan G. Grandjean, MD PhD, cardio-thoracic surgeon, Thorax Center Twente, Medisch Spectrum Twente, Enschede,


Patients with coronary artery disease have a reduced blood flow to the heart. This may cause discomfort such as angina pectoris (chest pain) and can lead to a myocardial infarction or even death.

To recover blood flow, a Percutaneous Coronary Intervention or Coronary Artery Bypass Graft (CABG) can be performed. A CABG is performed in the Netherlands over 10.000 times annually. A majority of CABG is done with a heart lung machine that takes over the circulation of blood to organs when the heart is stopped. This comes with a risk of kidney or liver failure and even cerebral vascular accidents.

To overcome these complications, a CABG can be executed as a beating heart procedure. This Off-Pump Coronary Artery Bypass (OPCAB) procedure shows fewer of these complications, yet is difficult to learn for heart surgeons and therefore not available to all patients. This might lead to incomplete revascularization of the heart and thus suboptimal therapy for patients.

Research Problem

The only training facility at the moment is to learn OPCAB on patients. In this learning phase, more complications such as conversions to CABG and myocardial infarction are observed. As a result, the fraction OPCAB to CABG surgery is declining in the past years, despite its advantages for several patient groups.

In a collaboration between University of Twente and Medisch Spectum Twente hospital, an OBCAB simulator is under development. Unfortunately, it is not entirely clear what the crucial factors are for learning OBCAB and how that could best be implemented in the OPCAB simulator.

Aim of this research

What are the crucial aspects to implement for an effective simulator-based OPCAB training?

Important subquestions are:

  1. How is OPCAB currently performed (in cooperation with a Technical Medicine student)
  2. What kind of training facilities to learn OPCAB currently exist?
  3. How would experienced OPCAB surgeons want to disseminate their knowledge and skills to others?
  4. How would unexperienced OPCAB surgeons want to learn an OPCAB-procedure?


A literature review and field research is necessary for questions 1 and 2. For 3 (and 4) interviews and focus group are desired. Experts on Human Factors and cardiac surgery within Medisch Spectrum Twente are available, as well as a national network of cardiac surgeons.

General Remarks

  • There is also a quantitative Master’s thesis on this topic available
  • This Master´s thesis is available as a 35 ECTS of 25+10 project including an internship.


Students that wish to do a 10 ECTS internship are also encouraged to choose this topic. A 10-week internship can be done at the department of cardio-thoracic surgery, Thorax Centrum Twente, Medisch Spectrum Twente hospital in Enschede.

Students will experience daily routine in cardio-thoracic surgery such as multidisciplinary meetings (and for those who want even the operating theatre), see the influence of human factors on selection and outcome of surgery and describe these influences in a practical internship project. The internship will be followed by a 25 ECTS Master’s thesis.

Note: This project is considered an internal assignment (with/without internship), since there will not be an application process for the internship part.