Setting Standards in Small Samples
The PhD defence of Monika Vaheoja will take place (partly) online and can be followed by a live stream.
Monika Vaheoja is a PhD student in the research group Cognition, Data and Education (CDE). Supervisor is prof.dr.ir. T.J.H.M. Eggen from the Faculty of Behavioural Management and Social Sciences (BMS) and co-supervisor is dr. N.D. Verhelst from Eurometrics.
Setting standards for exams is an ongoing process involving different stakeholders. The term ‘standard’ in the current thesis uses its theoretical meaning as an agreement about a minimum requirement to pass an exam in terms of knowledge, proficiency, ability, and aptitudes. When this theoretical standard has been set on an exam test score scale, it becomes ‘a performance standard,’ a synonym for a cut score or a minimum passing score on a given exam form.
Various manuals are available that discuss the process of standard setting for exams and critical reviews discussing standard setting methods. Even so, too little research is available for maintaining a standard on a new exam form in a small sample context. The literature to maintain equivalent scores across different exam forms is mainly based on large samples and Item Response Theory (IRT). But, in practice, for example, in teacher training programs, it is often impossible to have large samples and to collect more data might take a few years. That is impossible because it would mean a postponement of grading until enough data are collected.
Chapter 3 therefore compared IRT equating with an equating method from classical test theory with a simulation study. As the results were in favor of IRT equating, was the following chapter 4 designed to theoretically express the observed variability of the estimated cut score on the second exam form. This chapter was the most challenging part of the current thesis, and further research is needed on this. Chapter 5 evaluated the standard setting process in which the standard was set on each exam form separately, using the Angoff or Cohen methods. The results demonstrated an unfair cut score when different panels estimated the cut score on each exam form separately and when the cut score was estimated separately with the Cohen method. To eliminate the criticism that the panel’s composition might have caused this result, an experiment was conducted while keeping the composition of the expert panel constant. The final chapter of the thesis concerned feedback of the results to different stakeholders. The interpretation of an examinees results is one of the essential parts of the examination process. It is necessary to identify the strengths and weaknesses of the examinee and identify the strengths and weaknesses of the teacher-training programmes from different parts of the exam. This thesis ends with practical advice for 10voordeleraar about setting standards for the teacher-training exams.