The Complexities of Simplicity - Examining Heuristic Expert Evaluation of Municipal Websites
Marieke Welle Donker-Kuijer is a PhD student in the Department of Communication Science. Promotors are prof.dr. M.D.T. de Jong from the Faculty of Behavioural, Management and Social Science and prof.dr. L.R. Lentz from Utrecht University.
This Phd-thesis studies heuristic evaluation, a popular expert-focused evaluation method. It consists of a set of usability guidelines, heuristics, used by experts to systematically evaluate the usability of interfaces, such as e-government websites. Heuristic evaluation is presented as a “simple” method, that can be learned fast and applied by evaluators with different levels of expertise and experience. However, studies show that variations in the operationalization, especially in content and presentation format of heuristics and the amount of experience of evaluators, lead to differences in results. More insight into how the variables of heuristics’ abstraction level and evaluation experience of the evaluators influence evaluation process and results may help to increase validity and reliability of heuristic evaluation.
The first study showed that e-government heuristics are very complicated documents, raising questions about their usability for evaluators. In the second study, sixteen communication professionals evaluated two parts of a municipal website, first via unguided evaluation, then via high-level or low-level heuristics. Using heuristics mostly narrowed the experts’ focus to the topic of the heuristics (limitation function) instead of helping them to discover more problems (enrichment function). The limitation function was also visible in the third study, where sixty-three experts in three conditions -unguided, high-level heuristic and low-level heuristic evaluation- evaluated four municipal websites, with the unguided evaluation leading to more and broader problem detections than the low-level heuristic evaluation. Increased evaluation experience led to more problem detections, without interaction effects between variables. Via analyses of think-aloud protocols and reported problems of heuristic evaluations, the final study showed that heuristics played a role in only 27% of problem detections, and experts used a variety of knowledge resources. Nearly half of verbalized problems were not reported, both due to experts’ doubts about their validity, and experts’ difficulties in working with the heuristics. Furthermore, experts seemed to use heuristics more to categorize problems rather than directly discover them.
The complexity of heuristics, in terms of content structuring, formulations and abstraction level, presents significant challenges for their usability. Especially in heuristics that cover multiple topics, e.g. for e-government sites and services, the complexity of the presentation format raises questions about how they can be used efficiently by evaluators. The studies in this thesis also show that the value of heuristics as problem detection aids appears to be limited when compared to unguided expert evaluation. Rather than facilitating the identification of a large array of issues, the heuristics seemed to steer the experts toward a subset of problems that fit within the scope of the heuristics. This effect as a tool to focus attention to certain topics of interest can be very useful, especially when working with very extensive evaluation objects, such as in e-government websites, or when collaborating with more evaluators. The distinction between high-level and low-level heuristics is significant; low-level heuristics interfered more with the expert's evaluation process than high-level heuristics, leading to a much lower productivity and constraining of focus. The observations of the videos and think-aloud protocols saw low-level experts going back and forth between the heuristics document and the website, and flipping through the multiple pages of their heuristics. Developing an efficient way to combine the heuristics, their own expertise and inspecting the website was difficult for them. It is important to consider what these effects of high- and low-level heuristics mean for other heuristics. Some sets of heuristics intentionally combine high-level principles and low-level checklists. These hybrid forms may offer a productive middle ground, preserving the flexibility of high-level heuristics for experts while ensuring more inexperienced evaluators can find the extra guidance they need.
Furthermore, the interaction between the expertise of the expert and the expertise embedded in the heuristic is a crucial aspect of heuristic evaluation. Heuristics can serve both as repositories of relevant design knowledge and simultaneously act as structuring mechanisms and memory devices. Both roles require interpretation and integration of the heuristics with the evaluator’s expertise. The studies in this thesis show on the one hand that the heuristics represented existing design expertise, but some experts also voiced disagreements with parts of the heuristics, making it harder for them to apply those in their evaluations. High-level heuristics seem to leave more room for interpretation and integration with the existing expertise of the expert, making this the safer choice for more advanced evaluators.
Finally, there is an effect of experience, or learning effect, and this is visible after only four evaluation rounds. However, this learning effect is visible both in experts working with high- and low-level heuristics, but also in the experts performing unguided evaluations. There were no interaction effects between evaluation experience, and type of evaluation. Methodologically, these results provide support for the common practice that potential benefits of (new) heuristics are investigated in research designs using trained but relatively inexperienced novices.