Maintaining Human-AI Trust - Understanding Breakdowns and Repair
Esther Kox is a PhD student in the Department of Psychology of Conflict, Risk and Safety. (Co)Promotors are prof.dr. J.H. Kerstholt and dr.ir. P.W. de Vries from the Faculty of Behavioural, Management and Social Science and dr. M.B. van Riemsdijk from the Faculty of Electrical Engineering, Mathematics and Computer Science.
People are increasingly working with Artificial Intelligence (AI) agents, whether as software-based systems such as AI-chatbots and voice assistants, or embedded in hardware devices like autonomous vehicles, advanced robots, and drones. The idea of Human-AI (H-AI) collaboration is promising, since humans and AI possess complementary skills that, when combined, can enhance performance beyond the capabilities of its individual members. Here, the real challenge is not just determining which tasks are better suited for humans or machines working independently, but in finding ways to enhance their respective strengths through effective interaction. Working together towards a common goal requires good cooperation, coordination, and communication, and it is within these areas that the true challenges lie.
A key component in these activities is trust, as it allows individuals to depend on each other's contributions to complete tasks and achieve shared goals. More specifically, maintaining balanced trust (i.e., neither too much nor too little) is crucial for safe and effective H-AI collaborations. Finding this balance, a process known as trust calibration, should enable people to determine when to rely on AI agents and when to override them. To facilitate this, we need to understand how H-AI trust is built, breaks down, and recovers (i.e., the ‘trust lifecycle’). This dissertation focusses on how to maintain H-AI trust, by examining how trust breaks down (i.e., trust violations) and the mechanisms through which trust can be repaired.
In this thesis, I cover three types of trust violations, stemming from 1) inadequate abilities of the AI agent (errors), 2) unexpected behaviour without any explanation, and 3) priority misalignment. In other words, violations in respect to what an AI agent does, how it operates, and why it acts in a certain way. Additionally, we examined the impact of various trust-repair mechanisms on the development of H-AI trust. We evaluated a preventative measure designed to mitigate potential trust issues by proactively communicating uncertainty (i.e., “environment detected as clear, with 80% certainty”) and reactive strategies addressing trust violations post-incident, such as expressing regret (i.e., “I am sorry”) or providing explanations for anomalous behaviour. These strategies can be categorized as informational (e.g., uncertainty, explanations) or affective (e.g., regret), aiming either to improve the AI agent's interpretability or restore trust through emotional engagement. In short, we investigate how the nature of a trust violation and different repair strategies influence the development of H-AI trust.
Data for these studies were obtained using a series of custom-designed, game-like virtual task environments, simulating military scenarios where participants carried out missions in collaboration with an AI agent, presented in various physical forms. In each study, we used repeated measures of H-AI trust to track its changes over time.
Chapter 2 and 3 examine trust violations due to the inadequate abilities of the AI agent. In Chapter 2, participants were assigned to return to basecamp as fast as possible after running out of ammunition. Halfway, the AI agent failed to warn the participant for an approaching enemy. Following this failure, the AI agent employed one of four trust repair strategies: an explanation or an expression of regret either individually, combined, or neither. H-AI trust recovered only when the apology included an expression of regret, with even greater recovery when both regret and explanation were offered.
Chapter 3 involves house-searches in two abandoned buildings, supported by a small drone. Halfway, the AI agent failed to warn the participant for a hazard. We studied the effects of uncertainty communication and apology (i.e., explanation + regret), deployed before and after trust had been violated respectively. We conducted this study with both civilian and military samples to investigate whether findings were consistent across different participant groups. Results showed that (a) communicating uncertainty led to more trust, (b) an incorrect advice by the agent led to a less severe decline in trust when that advice included a notion of uncertainty, and (c) after a trust violation, trust recovered significantly more when the agent offered an apology. The two latter effects were only found in the study with civilians.
Chapter 4 examines a trust violation due to unexpected behaviour and the AI agent’s incapacity to explain itself. Halfway a reconnaissance mission, the AI agent detected a faster alternative route that emerged due to changes in the environment (i.e., the river had dried up) and decided to deviate from the original plan. We studied the effect of transparency (i.e. regular status updates and an explanation for the deviation) and outcome on trust and the participant’s workload. The main result was that transparency prevented a trust violation and contributed to higher levels of trust, without increasing subjective workload.
Chapter 5 examines a trust violation caused by priority misalignment. Halfway during the mission, the AI agent, who was guiding the participant, did not warn the participant in time of a hazard down the road. In one condition, it explained that this failure was due to an underperforming sensor. In the other condition, the AI agent explained that it deliberately recommended the faster route over the safer one. The rationale was that the rest of the team was waiting, and further delays could jeopardize both the team and the mission. Our findings suggest that trust violations due to choices are harder to repair than those due to errors.
By analysing the dynamics of trust during H-AI interaction, this research aims to inform the design of AI-systems that promote calibrated trust in high-stakes environments. As AI agents gain decision-making authority in the physical and virtual world, they will increasingly face conflicting human values (e.g., privacy vs. safety, efficiency vs. empathy). As they get more autonomous and complex, moral considerations will play a larger role, and trust may be lost not only due to malfunctions but also due to miscommunication and misaligned values. The trustworthiness of an AI agent is no longer determined solely by what it can do, but also by how and why it does so. Our findings support the growing consensus that H-AI trust, much like interpersonal trust, is multidimensional, even if the moral dimensions are not yet as apparent in current interactions. As the complexity of H-AI trust grows, maintaining an appropriate level of trust becomes increasingly important. Designing and developing trustworthy AI agents for safe and effective H-AI collaborations requires a systematic and multidisciplinary approach.