The Annals of Applied Statistics

Inferring constructs of effective teaching from classroom observations: An application of Bayesian exploratory factor analysis without restrictions

J. R. Lockwood, Terrance D. Savitsky, and Daniel F. McCaffrey

Full-text: Open access


Ratings of teachers’ instructional practices using standardized classroom observation instruments are increasingly being used for both research and teacher accountability. There are multiple instruments in use, each attempting to evaluate many dimensions of teaching and classroom activities, and little is known about what underlying teaching quality attributes are being measured. We use data from multiple instruments collected from 458 middle school mathematics and English language arts teachers to inform research and practice on teacher performance measurement by modeling latent constructs of high-quality teaching. We make inferences about these constructs using a novel approach to Bayesian exploratory factor analysis (EFA) that, unlike commonly used approaches for identifying factor loadings in Bayesian EFA, is invariant to how the data dimensions are ordered. Applying this approach to ratings of lessons reveals two distinct teaching constructs in both mathematics and English language arts: (1) quality of instructional practices; and (2) quality of teacher management of classrooms. We demonstrate the relationships of these constructs to other indicators of teaching quality, including teacher content knowledge and student performance on standardized tests.

Article information

Ann. Appl. Stat., Volume 9, Number 3 (2015), 1484-1509.

Received: June 2014
Revised: March 2015
First available in Project Euclid: 2 November 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Teaching quality teacher value-added Bayesian hierarchical models ordinal data latent variable models


Lockwood, J. R.; Savitsky, Terrance D.; McCaffrey, Daniel F. Inferring constructs of effective teaching from classroom observations: An application of Bayesian exploratory factor analysis without restrictions. Ann. Appl. Stat. 9 (2015), no. 3, 1484--1509. doi:10.1214/15-AOAS833.

Export citation


  • Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
  • Bhattacharya, A. and Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98 291–306.
  • Bill and Melinda Gates Foundation (2013). Ensuring fair and reliable measures of effective teaching: Culminating findings from the MET project’s three-year study. Available at
  • Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research 36 111–150.
  • Carmeci, G. (2009). A Metropolis–Hastings algorithm for reduced rank covariance matrices with application to Bayesian factor models. DISES working papers, Univ. Trieste, Italy.
  • Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97 465–480.
  • Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q. and West, M. (2008). High-dimensional sparse factor modeling: Applications in gene expression genomics. J. Amer. Statist. Assoc. 103 1438–1456.
  • Casabianca, J., Lockwood, J. R. and McCaffrey, D. F. (2015). Trends in classroom observation scores. Educational and Psychological Measurement 75 311–337.
  • Congdon, P. (2005). Bayesian Models for Categorical Data. Wiley, Chichester.
  • Danielson, C. (2011). Enhancing Professional Practice: A Framework for Teaching. ASCD, Alexandria, VA.
  • Erosheva, E. A. and Curtis, S. M. (2013). Dealing with rotational invariance in Bayesian confirmatory factor models. Technical Report 589, Univ. Washington, Seattle, WA.
  • Früwirth-Schnatter, S. and Lopes, H. F. (2013). Parsimonious Bayesian factor analysis when the number of factors is unknown. Working paper, Univ. Chicago Booth School of Business, Chicago, IL.
  • Geisser, S. and Eddy, W. F. (1979). A predictive approach to model selection. J. Amer. Statist. Assoc. 74 153–160.
  • Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 1 515–533 (electronic).
  • Geweke, J. and Zhou, G. (1996). Measuring the pricing error of the arbitrage pricing theory. The Review of Financial Studies 9 557–587.
  • Ghosh, J. and Dunson, D. B. (2009). Default prior distributions and efficient posterior computation in Bayesian factor analysis. J. Comput. Graph. Statist. 18 306–320.
  • Gitomer, D. H., Bell, C. A., Qi, Y., McCaffrey, D. F., Hamre, B. K. and Pianta, R. C. (2014). The instructional challenge in improving teaching quality: Lessons from a classroom observation protocol. Teachers College Record 116 1–32.
  • Gordon, R., Kane, T. J. and Staiger, D. O. (2006). Identifying effective teachers using performance on the job. Discussion Paper 2006-01, The Brookings Institution, Washington, DC.
  • Grossman, P., Loeb, S., Cohen, J., Hammerness, K., Wyckoff, J., Boyd, D. and Lankford, H. (2010). Measure for measure: The relationship between measures of instructional practice in middle school English language arts and teachers’ value-added scores. Working Paper 16015, National Bureau of Economic Research, Cambridge, MA.
  • Hahn, P. R., Carvalho, C. M. and Scott, J. G. (2012). A sparse factor analytic probit model for congressional voting patterns. J. R. Stat. Soc. Ser. C. Appl. Stat. 61 619–635.
  • Hamre, B. K., Pianta, R. C., Burchinal, M., Field, S., LoCasale-Crouch, J., Downer, J. T., Howes, C., LoParo, K. and Scott-Little, C. (2012). A course on effective teacher–child interactions: Effects on teacher beliefs, knowledge, and observed practice. American Educational Research Journal 49 88–123.
  • Hamre, B. K., Pianta, R. C., Downer, J. T., DeCoster, J., Mashburn, A. J., Jones, S. M., Brown, J. L., Cappella, E., Atkins, M., Rivers, S. E., Brackett, M. and Hakigami, A. (2013). Teaching through interactions: Testing a developmental framework of teacher effectiveness in over 4000 classrooms. The Elementary School Journal 113 461–487.
  • Hartigan, J. A. and Hartigan, P. M. (1985). The dip test of unimodality. Ann. Statist. 13 70–84.
  • Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 1090–1098.
  • Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika 30 179–185.
  • Ishwaran, H. (2000). Univariate and multirater ordinal cumulative link regression with covariate specific cutpoints. Canad. J. Statist. 28 715–730.
  • Johnson, V. E. (1996). On Bayesian analysis of multirater ordinal data: An application to automated essay grading. J. Amer. Statist. Assoc. 91 42–51.
  • Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika 23 187–200.
  • Learning Mathematics for Teaching Project (2006). A coding rubric for measuring the mathematics quality of instruction. Technical Report LMT1.06, Univ. Michigan, Ann Arbor, MI.
  • Lockwood, J. R. and McCaffrey, D. F. (2014). Correcting for test score measurement error in ANCOVA models for estimating treatment effects. Journal of Educational and Behavioral Statistics 39 22–52.
  • Lockwood, J., Savitsky, T. and McCaffrey, D. (2015). Supplement to “Inferring constructs of effective teaching from classroom observations: An application of Bayesian exploratory factor analysis without restrictions.” DOI:10.1214/15-AOAS833SUPP.
  • Lopes, H. F. and West, M. (2004). Bayesian model assessment in factor analysis. Statist. Sinica 14 41–67.
  • McCaffrey, D. F., Yuan, K., Savitsky, T. D., Lockwood, J. R. and Edelen, M. O. (2015). Uncovering multivariate structure in classroom observations in the presence of rater errors. Educational Measurement: Issues and Practice 34 34–46.
  • McParland, D., Gormley, I. C., McCormick, T. H., Clark, S. J., Kabudula, C. W. and Collinson, M. A. (2014). Clustering South African households based on their asset status using latent variable models. Ann. Appl. Stat. 8 747–776.
  • Peterson, P. E., Woessmann, L., Hanushek, E. A. and Lastra-Anadón, C. X. (2011). Globally challenged: Are US students ready to compete? PEPG Report 11-03, Harvard’s Program on Education Policy and Governance & Education Next, Taubman Center for State and Local Government, Harvard Kennedy School, Cambridge, MA.
  • Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003). Vienna, Austria.
  • Savitsky, T. D. and McCaffrey, D. F. (2014). Bayesian hierarchical multivariate formulation with factor analysis for nested ordinal data. Psychometrika 79 275–302.
  • Shulman, L. S. (1987). Knowledge and teaching: Foundations of the new reform. Harvard Educational Review 57 1–23.
  • Stephens, M. (2000). Dealing with label switching in mixture models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 62 795–809.
  • van der Linden, W. and Hambleton, R. K., eds. (1997). Handbook of Modern Item Response Theory. Springer, New York.