The Annals of Applied Statistics

Unmixing Rasch scales: How to score an educational test

Maria Bolsinova, Gunter Maris, and Herbert Hoijtink

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

One of the important questions in the practice of educational testing is how a particular test should be scored. In this paper we consider what an appropriate simple scoring rule should be for the Dutch as a second language test consisting of listening and reading items. As in many other applications, here the Rasch model which allows to score the test with a simple sumscore is too restrictive to adequately represent the data. In this study we propose an exploratory algorithm which clusters the items into subscales each fitting a Rasch model and thus provides a scoring rule based on observed data. The scoring rule produces either a weighted sumscore based on equal weights within each subscale or a set of sumscores (one for each of the subscales). An MCMC algorithm which enables to determine the number of Rasch scales constituting the test and to unmix these scales is introduced and evaluated in simulations. Using the results of unmixing, we conclude that the Dutch language test can be scored with a weighted sumscore with three different weights.

Article information

Source
Ann. Appl. Stat., Volume 10, Number 2 (2016), 925-945.

Dates
Received: May 2015
Revised: February 2016
First available in Project Euclid: 22 July 2016

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1469199899

Digital Object Identifier
doi:10.1214/16-AOAS919

Mathematical Reviews number (MathSciNet)
MR3528366

Zentralblatt MATH identifier
06625675

Keywords
Educational testing Markov chain Monte Carlo mixture model multidimensional IRT one parameter logistic model Rasch model scoring rule

Citation

Bolsinova, Maria; Maris, Gunter; Hoijtink, Herbert. Unmixing Rasch scales: How to score an educational test. Ann. Appl. Stat. 10 (2016), no. 2, 925--945. doi:10.1214/16-AOAS919. https://projecteuclid.org/euclid.aoas/1469199899


Export citation

References

  • Adams, R., Wilson, M. and Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Appl. Psychol. Meas. 12 261–280.
  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control AC-19 716–723.
  • Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika 38 123–140.
  • Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika 42 69–81.
  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores (F. M. Lord and M. R. Novick, eds.) 395–479. Addison-Wesley, Reading, MA.
  • Bolsinova, M., Maris, G. and Hoijtink, H. (2016). Supplement to “Unmixing Rasch scales: How to score an educational test.” DOI:10.1214/16-AOAS919SUPP.
  • Casella, G. and George, E. I. (1992). Explaining the Gibbs sampler. Amer. Statist. 46 167–174.
  • Celeux, G., Hurn, M. and Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. J. Amer. Statist. Assoc. 95 957–970.
  • College voor Toetsen en Examens: Staatsexamens NT2 (n.d.). Retrieved September 25, 2015. Available at http://www.staatsexamensnt2.nl.
  • Council of Europe (2011). Common European Framework of Reference for Learning, Teaching, Assessment. Council of Europe.
  • Debelak, R. and Arendasy, M. (2012). An algorithm for testing unidimensionality and clustering items in Rasch measurement. Educ. Psychol. Meas. 72 375–387.
  • Diebolt, J. and Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. J. Roy. Statist. Soc. Ser. B 56 363–375.
  • Fischer, G. H. (1995). Derivations of the Rasch model. In Rasch Models (Vienna, 1993) (G. H. Fisher and I. W. Molenaar, eds.) 15–38. Springer, New York.
  • Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer, New York.
  • Gamerman, D. and Lopes, H. F. (2006). Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6 721–741.
  • Ghosh, M., Ghosh, A., Chen, M. and Agresti, A. (2000). Noninformative priors for one-parameter item response models. J. Statist. Plann. Inference 88 99–115.
  • Hardouin, J.-B. and Mesbah, M. (2004). Clustering binary variables in subscales using an extended Rasch model and Akaike information criterion. Comm. Statist. Theory Methods 33 1277–1294.
  • Hoff, P. D. (2009). A First Course in Bayesian Statistical Methods. Springer Texts in Statistics. Springer, New York.
  • Humphry, S. (2011). The role of the unit in physics and psychometrics. Measurement: Interdisciplinary Research and Perspective 9 1–24.
  • Humphry, S. (2012). Item set discrimination and the unit in the Rasch model. J. Appl. Meas. 13 165–224.
  • Humphry, S. and Andrich, D. (2008). Understanding the unit in the Rasch model. J. Appl. Meas. 9 249–264.
  • Lord, F. M. and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Addison-Wesley, Reading, MA.
  • Mair, P. and Hatzinger, R. (2007). Extended Rasch modeling: The eRm package for the application of IRT models in R. J. Stat. Softw. 20 1–20.
  • McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley, New York.
  • Rasch, G. (1980). Probabilistic Models for Some Intelligence and Attainment Tests, expanded edition. The Univ. Chicago Press, Chicago.
  • Reckase, M. (2008). Multidimensional Item Response Theory. Springer, New York, NY.
  • Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement 14 271–282.
  • Schwarz, G. (1978). Estimating the dimension of the model. Ann. Statist. 6 461–464.
  • Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc. 82 528–550.
  • Verhelst, N. D. and Glas, C. A. W. (1995). The one parameter logistic model: OPLM. In Rasch Models: Foundations, Recent Developments and Applications (G. H. Fischer and I. W. Molenaar, eds.) 215–238. Springer, New York.
  • Zeger, K. and Karim, M. (1991). Generalized linear models with random effects: A Gibbs sampling approach. J. Amer. Statist. Assoc. 86 79–86.

Supplemental materials

  • Supplement A: Supplement to “Unmixing Rasch scales: How to score an educational test.”. We provide the proof of identification of the multi-scale Rasch model in Section 1, details of the Gibbs Sampler for estimating the model in Section 2, details on approximating the likelihood of the model in Section 3, results of additional simulation studies in Section 4, and details on estimation of the model with fixed correlation parameters in Section 5.