Open Access
December 2019 Outline analyses of the called strike zone in Major League Baseball
Dale L. Zimmerman, Jun Tang, Rui Huang
Ann. Appl. Stat. 13(4): 2416-2451 (December 2019). DOI: 10.1214/19-AOAS1285


We extend statistical shape analytic methods known as outline analysis for application to the strike zone, a central feature of the game of baseball. Although the strike zone is rigorously defined by Major League Baseball’s official rules, umpires make mistakes in calling pitches as strikes (and balls) and may even adhere to a strike zone somewhat different than that prescribed by the rule book. Our methods yield inference on geometric attributes (centroid, dimensions, orientation and shape) of this “called strike zone” (CSZ) and on the effects that years, umpires, player attributes, game situation factors and their interactions have on those attributes. The methodology consists of first using kernel discriminant analysis to determine a noisy outline representing the CSZ corresponding to each factor combination, then fitting existing elliptic Fourier and new generalized superelliptic models for closed curves to that outline and finally analyzing the fitted model coefficients using standard methods of regression analysis, factorial analysis of variance and variance component estimation. We apply these methods to PITCHf/x data comprising more than three million called pitches from the 2008–2016 Major League Baseball seasons to address numerous questions about the CSZ. We find that all geometric attributes of the CSZ, except its size, became significantly more like those of the rule-book strike zone from 2008–2016 and that several player attribute/game situation factors had statistically and practically significant effects on many of them. We also establish that the variation in the horizontal center, width and area of an individual umpire’s CSZ from pitch to pitch is smaller than their variation among CSZs from different umpires.


Download Citation

Dale L. Zimmerman. Jun Tang. Rui Huang. "Outline analyses of the called strike zone in Major League Baseball." Ann. Appl. Stat. 13 (4) 2416 - 2451, December 2019.


Received: 1 January 2018; Revised: 1 May 2019; Published: December 2019
First available in Project Euclid: 28 November 2019

zbMATH: 07160945
MathSciNet: MR4037436
Digital Object Identifier: 10.1214/19-AOAS1285

Keywords: Elliptic Fourier model , kernel discriminant analysis , morphometrics , orthogonal distance fitting , shape analysis , superellipse

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.13 • No. 4 • December 2019
Back to Top