The Annals of Applied Statistics

Unmixing Rasch scales: How to score an educational test

Maria Bolsinova, Gunter Maris, and Herbert Hoijtink

Full-text: Open access


One of the important questions in the practice of educational testing is how a particular test should be scored. In this paper we consider what an appropriate simple scoring rule should be for the Dutch as a second language test consisting of listening and reading items. As in many other applications, here the Rasch model which allows to score the test with a simple sumscore is too restrictive to adequately represent the data. In this study we propose an exploratory algorithm which clusters the items into subscales each fitting a Rasch model and thus provides a scoring rule based on observed data. The scoring rule produces either a weighted sumscore based on equal weights within each subscale or a set of sumscores (one for each of the subscales). An MCMC algorithm which enables to determine the number of Rasch scales constituting the test and to unmix these scales is introduced and evaluated in simulations. Using the results of unmixing, we conclude that the Dutch language test can be scored with a weighted sumscore with three different weights.

Article information

Ann. Appl. Stat., Volume 10, Number 2 (2016), 925-945.

Received: May 2015
Revised: February 2016
First available in Project Euclid: 22 July 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Educational testing Markov chain Monte Carlo mixture model multidimensional IRT one parameter logistic model Rasch model scoring rule


Bolsinova, Maria; Maris, Gunter; Hoijtink, Herbert. Unmixing Rasch scales: How to score an educational test. Ann. Appl. Stat. 10 (2016), no. 2, 925--945. doi:10.1214/16-AOAS919.

Export citation


  • Adams, R., Wilson, M. and Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Appl. Psychol. Meas. 12 261–280.
  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control AC-19 716–723.
  • Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika 38 123–140.
  • Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika 42 69–81.
  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores (F. M. Lord and M. R. Novick, eds.) 395–479. Addison-Wesley, Reading, MA.
  • Bolsinova, M., Maris, G. and Hoijtink, H. (2016). Supplement to “Unmixing Rasch scales: How to score an educational test.” DOI:10.1214/16-AOAS919SUPP.
  • Casella, G. and George, E. I. (1992). Explaining the Gibbs sampler. Amer. Statist. 46 167–174.
  • Celeux, G., Hurn, M. and Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. J. Amer. Statist. Assoc. 95 957–970.
  • College voor Toetsen en Examens: Staatsexamens NT2 (n.d.). Retrieved September 25, 2015. Available at
  • Council of Europe (2011). Common European Framework of Reference for Learning, Teaching, Assessment. Council of Europe.
  • Debelak, R. and Arendasy, M. (2012). An algorithm for testing unidimensionality and clustering items in Rasch measurement. Educ. Psychol. Meas. 72 375–387.
  • Diebolt, J. and Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. J. Roy. Statist. Soc. Ser. B 56 363–375.
  • Fischer, G. H. (1995). Derivations of the Rasch model. In Rasch Models (Vienna, 1993) (G. H. Fisher and I. W. Molenaar, eds.) 15–38. Springer, New York.
  • Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. Springer, New York.
  • Gamerman, D. and Lopes, H. F. (2006). Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6 721–741.
  • Ghosh, M., Ghosh, A., Chen, M. and Agresti, A. (2000). Noninformative priors for one-parameter item response models. J. Statist. Plann. Inference 88 99–115.
  • Hardouin, J.-B. and Mesbah, M. (2004). Clustering binary variables in subscales using an extended Rasch model and Akaike information criterion. Comm. Statist. Theory Methods 33 1277–1294.
  • Hoff, P. D. (2009). A First Course in Bayesian Statistical Methods. Springer Texts in Statistics. Springer, New York.
  • Humphry, S. (2011). The role of the unit in physics and psychometrics. Measurement: Interdisciplinary Research and Perspective 9 1–24.
  • Humphry, S. (2012). Item set discrimination and the unit in the Rasch model. J. Appl. Meas. 13 165–224.
  • Humphry, S. and Andrich, D. (2008). Understanding the unit in the Rasch model. J. Appl. Meas. 9 249–264.
  • Lord, F. M. and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Addison-Wesley, Reading, MA.
  • Mair, P. and Hatzinger, R. (2007). Extended Rasch modeling: The eRm package for the application of IRT models in R. J. Stat. Softw. 20 1–20.
  • McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley, New York.
  • Rasch, G. (1980). Probabilistic Models for Some Intelligence and Attainment Tests, expanded edition. The Univ. Chicago Press, Chicago.
  • Reckase, M. (2008). Multidimensional Item Response Theory. Springer, New York, NY.
  • Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement 14 271–282.
  • Schwarz, G. (1978). Estimating the dimension of the model. Ann. Statist. 6 461–464.
  • Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc. 82 528–550.
  • Verhelst, N. D. and Glas, C. A. W. (1995). The one parameter logistic model: OPLM. In Rasch Models: Foundations, Recent Developments and Applications (G. H. Fischer and I. W. Molenaar, eds.) 215–238. Springer, New York.
  • Zeger, K. and Karim, M. (1991). Generalized linear models with random effects: A Gibbs sampling approach. J. Amer. Statist. Assoc. 86 79–86.

Supplemental materials

  • Supplement A: Supplement to “Unmixing Rasch scales: How to score an educational test.”. We provide the proof of identification of the multi-scale Rasch model in Section 1, details of the Gibbs Sampler for estimating the model in Section 2, details on approximating the likelihood of the model in Section 3, results of additional simulation studies in Section 4, and details on estimation of the model with fixed correlation parameters in Section 5.