Assessment of learning in higher education is a critical concern to policy makers, educators, parents, and students. And, doing so appropriately is likely to require including constructed response tests in the assessment system. We examined whether scoring costs and other concerns with using open-end measures on a large scale (e.g., turnaround time and inter-reader consistency) could be addressed by machine grading the answers. Analyses with 1359 students from 14 colleges found that two human readers agreed highly with each other in the scores they assigned to the answers to three types of open-ended questions. These reader assigned scores also agreed highly with those assigned by a computer. The correlations of the machine-assigned scores with SAT scores, college grades, and other measures were comparable to the correlations of these variables with the hand-assigned scores. Machine scoring did not widen differences in mean scores between racial/ethnic or gender groups. Our findings demonstrated that machine scoring can facilitate the use of open-ended questions in large-scale testing programs by providing a fast, accurate, and economical way to grade responses.
Digital Object Identifier: 10.1214/193940307000000392