See Events

PhD Defence Jorine Vermeulen | Diagnostic mathematics assessment in the third grade

Diagnostic mathematics assessment in the third grade

Due to the COVID-19 crisis the PhD defence of Jorine Vermeulen will take place (partly) online.

The PhD defence can be followed by a live stream.

Jorine Vermeulen is a PhD student in the research group Research Methodology, Measurement and Data Analysis (OMD). Her supervisor is prof.dr.ir. T.J.H.M. Eggen from the Faculty of Behavioural, Management and Social Sciences (BMS).

The purpose of this dissertation was to contribute to teachers’ formative classroom assessment practice through the design of diagnostic tasks. Cognitive diagnostic assessment in mathematics designed for formative purposes aims to collect response behaviour that is indicative of students’ mathematical thinking. The focus was on studying students’ written response behaviour and associated mathematical thinking in third grade subtraction and addition. In Chapters 2, 3, and 4 the diagnostic value of response behaviour captured with the empty number line (ENL) was evaluated (path 1). Diagnosing students’ bridging errors in subtraction was the focus of Chapters 5 and 6 (path 2). The studies in this dissertation contributed to the following research questions:

  1. What kind of response behaviour is considered diagnostically relevant for formative decision making in third grade mathematics?
  2. What features should diagnostic tasks have to obtain response behaviour that is considered relevant for teachers’ formative decision making in third grade mathematics?

Chapter 1 contains the general introduction to this dissertation. It also describes the origin of the Improving Classroom Assessment (ICA) project and the need for diagnostic assessment in mathematics education. Diagnostic assessment focuses on assessing domain-specific educational needs. Within the cycle of formative assessment, diagnostic assessment is a cyclic process that zooms in on students’ mathematical thinking, cognitive strengths and weaknesses. Diagnostic tasks and frameworks are used to test diagnostic hypotheses and gather information about students’ specific educational needs (see Figure 1.1). To interpret students’ response behaviour in terms of their thinking, diagnostic frameworks are needed to analyse the assessment data. Chapter 1 ends with an outline of the chapters within this dissertation.

Chapter 2 presents a diagnostic framework for multi-digit subtraction with the ENL. The diagnostic framework focuses on different perspectives for analysing ENL solutions and provides suggestions for classroom and individual interventions. A literature review resulted in three perspectives to analyse ENL solutions: 1. Procedural and conceptual development of subtraction; 2. Errors, misconceptions, and self-regulation, and 3. Conceptual knowledge of the ENL. Subsequently, the diagnostic framework was refined through the analysis of 600 ENL solutions obtained from 30 third grade students and the consultation of two teachers. The results showed that the three perspectives within the diagnostic framework are suitable for interpreting students’ ENL solutions and plan formative interventions. The ENL is particularly effective for diagnosing self-regulatory errors, but less useful for diagnosing systematic errors that are caused by buggy algorithms.

Chapter 3 focused on the benefits of tablet technology for collecting and analysing response behaviour in comparison to paper-and-pencil tasks. On two occasions 123 Dutch third graders were assessed with either a paper-and-pencil task or a tablet task in which ENL could be used voluntarily. Tablet technology could, however, alter students’ response behaviour which has consequences for inferences made about students’ mathematical thinking. Students in the tablet condition used abbreviated mental strategies more often than in the paper condition. Nevertheless, a between group ANCOVA showed no significant differences between tablet and paper for the ENL frequency and for task score. A within group ANCOVA showed that students who made the paper-pencil-task during the first assessment occasion used the ENL less frequently in the tablet task during the second assessment occasion. Additionally, during the first assessment occasion, no significant differences in students’ scores were found, while during the second assessment occasion students who made the paper task first scored significantly lower than students who made the tablet task first.

Chapter 4 explored the relationships between task beliefs about the ENL, mathematical ability, gender, and voluntary ENL use (ENL frequency) in multi-digit subtraction and addition. 123 Dutch third-grade students and nine teachers from six schools participated in this study. A multilevel path analysis showed that task beliefs about the ENL mediated the relationship between students’ mathematical ability and ENL frequency. No gender differences were found in the multilevel path analysis. Finally, the results show that task beliefs about the ENL and ENL frequency differed across classrooms. The chapter ends with a discussion how teachers’ task beliefs about the ENL and classroom culture may influence students’ task beliefs and ENL frequency. In conclusion, this study illustrates that affective and conative processes can influence response behaviour and subsequent inferences about cognitive processes. Therefore, the influences of task beliefs should be considered when designing diagnostic assessment.

In Chapter 5 it is evaluated how the diagnostic capacity of subtraction items is related to their characteristics. The item characteristics being studied are open-ended and multiple-choice items, bare number, and word problems. As well as various number features, like the number of digits in the subtrahend and minuend. Diagnostic capacity is defined as the extent to which multi-digit subtraction items that require borrowing (e.g. 1000-680) elicit bridging errors, such as the smaller-from-larger-error. Item response theory (IRT) was used to estimate item properties. Subsequently, the item properties were used in two separate ANOVAs to compare the diagnostic capacity of multiple-choice versus open-ended items, bare number versus word problems, and three categories of number features. As expected, multiple-choice items have a higher diagnostic capacity than open-ended items. More interestingly, it was found that the number of digits (n) in the subtrahend and minuend influenced the diagnostic capacity of the items. Items from the category 3/4n-3n, like 1000-680, had the highest diagnostic capacity, whereas items characterized as 3/4n-2n, such as 1000-20, had the lowest diagnostic capacity.

Chapter 6 explored how bridging errors in subtraction are related to students’ mathematical ability. The study involved 694 third-grade students and 35 teachers from 25 Dutch schools. Multilevel regression analyses showed that the number of bridging errors was positively related to the students’ mathematical ability, after controlling for the total number of errors in subtraction. Thus, the students who had a high proportion of bridging errors within the total number of errors, had a relatively higher mathematical ability compared to the students who had a low proportion of bridging errors. This result implies that diagnosing bridging errors may help to identify where students stand within their mathematical development. The practical implications of this result for the design and use of diagnostic instruments are addressed in the discussion section.

Finally, Chapter 7 presents a general discussion of the results of each chapter organised by the two main research questions shown above. Based on the findings in this dissertation and the methodological limitations  of the studies, two lines of research are proposed. The first line of research focuses on combining design research and teacher training on the implementation of diagnostic assessment within a cycle of formative assessment. The second line of research focuses on the use of technology in diagnostic assessment. Finally, it is being argued that diagnostic assessment and formative decision making should be part of teacher training. This dissertation provides knowledge that can be used to educate teachers how to analyse students’ response behaviour on the ENL and how to diagnose and remediate bridging errors. Pre-service teacher training should focus on more general knowledge about what task and student characteristics could affect students’ response behaviour and what tasks are suitable for assessing students’ mathematical thinking.