Bayesian Analysis

Predictions Based on the Clustering of Heterogeneous Functions via Shape and Subject-Specific Covariates

Garritt L. Page and Fernando A. Quintana

Full-text: Open access


We consider a study of players employed by teams who are members of the National Basketball Association where units of observation are functional curves that are realizations of production measurements taken through the course of one’s career. The observed functional output displays large amounts of between player heterogeneity in the sense that some individuals produce curves that are fairly smooth while others are (much) more erratic. We argue that this variability in curve shape is a feature that can be exploited to guide decision making, learn about processes under study and improve prediction. In this paper we develop a methodology that takes advantage of this feature when clustering functional curves. Individual curves are flexibly modeled using Bayesian penalized B-splines while a hierarchical structure allows the clustering to be guided by the smoothness of individual curves. In a sense, the hierarchical structure balances the desire to fit individual curves well while still producing meaningful clusters that are used to guide prediction. We seamlessly incorporate available covariate information to guide the clustering of curves non-parametrically through the use of a product partition model prior for a random partition of individuals. Clustering based on curve smoothness and subject-specific covariate information is particularly important in carrying out the two types of predictions that are of interest, those that complete a partially observed curve from an active player, and those that predict the entire career curve for a player yet to play in the National Basketball Association.

Article information

Bayesian Anal., Volume 10, Number 2 (2015), 379-410.

First available in Project Euclid: 2 February 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Product partition models Nonparametric Bayes Penalized splines Hierarchical models Right censored data NBA player production curves


Page, Garritt L.; Quintana, Fernando A. Predictions Based on the Clustering of Heterogeneous Functions via Shape and Subject-Specific Covariates. Bayesian Anal. 10 (2015), no. 2, 379--410. doi:10.1214/14-BA919.

Export citation


  • Behseta, S., Kass, R. E., and Wallstrom, G. L. (2005). “Hierarchical Models for Assessing Variability Among Functions.” Biometrika, 92(2): 419–434.
  • Berry, S. M., Reese, C. S., and Larkey, P. D. (1999). “Bridging Different Eras in Sports.” Journal of the American Statistical Association, 94(447): 661–676.
  • Bigelow, J. L. and Dunson, D. B. (2007). “Bayesian Adaptive Regression Splines for Hierarchical Data.” Biometrics, 63: 724–732.
  • Biller, C. (2000). “Adaptive Bayesian Regression Splines in Semiparametric Generalized Linear Models.” Journal of Computational and Graphical Statistics, 9: 122–140.
  • Blackwell, D. and MacQueen, J. B. (1973). “Ferguson Distributions via Pólya Urn Schemes.” The Annals of Statistics, 1: 353–355.
  • Collins, L. M. and Lanza, S. T. (2010). Latent Class and Latent Transition Analysis. Hoboken, New Jersey: John Wiley and Sons, first edition.
  • Connolly, R. A. and Rendleman Jr., R. J. (2008). “Skill, Luck, and Streaky Play on the PGA Tour.” Journal of the American Statistical Association, 103: 74–88.
  • Dahl, D. B. (2006). “Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model.” In Vannucci, M., Do, K. A., and Müller, P. (eds.), Bayesian Inference for Gene Expression and Proteomics, 201–218. Cambridge University Press.
  • Dean, N. and Raftery, A. E. (2010). “Latent Class Analysis Variable Selection.” Annals of the Institute of Statistical Mathematics, 62: 11–35.
  • Denison, D. G. T., Holmes, C. C., Mallick, B. K., and Smith, A. F. M. (2002). Bayesian Methods for Nonlinear Classification and Regression. New York: John Wiley & Sons, first edition.
  • Di, C.-Z., Crainiceanu, C. M., Caffo, B. S., and Punjabi, N. M. (2009). “Multilevel Functional Principal Component Analysis.” The Annals of Applied Statistics, 3: 458–488.
  • DiMatteo, I., Genovese, C. R., and Kass, R. E. (2001). “Bayesian Curve-Fitting with Free-Knot Splines.” Biometrika, 88: 1055–1071.
  • Elliott, M. R., Gallo, J. J., Ten Have, T. R., Bogner, H. R., and Katz, I. R. (2005). “Using a Bayesian latent growth curve model to identify trajectories of positive affect and negative events following myocardial infarction.” Biostatistics, 6: 119–143.
  • Fahrmeir, L. and Kneib, T. (2005). Bayesian Smoothing and Regression for Longitudinal, Spatial and Event History Data. New York: Oxford University Press, 1st edition.
  • Gelfand, A. E. and Smith, A. F. M. (1990). “Sampling-Based Approaches to Calculating Marginal Densities.” Journal of the American Statistical Association, 85: 398–409.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Boca Raton Florida: CRC Press, third edition.
  • Geman, S. and Geman, D. (1984). “Stochastic Relaxation, Gibbs Distribution and Bayesian Restoration of Images.” IEEE Transactions on Pattern Analysis of Machine Intelligence, 6: 721–741.
  • Goldberg, Y., Ritov, Y., and Mandelbaum, A. (2014). “Predicting the Continuation of a Function with Applications to Call Center Data.” Journal of Statistical Planning and Inference, 147: 53–65.
  • Hollinger, J. (2002). Pro Basketball Prospectus 2002. Pro Basketball Forecast. Free Press.
  • Lang, S. and Brezger, A. (2004). “Bayesian P-Splines.” Journal of Computational and Graphical Statistics, 13(1): 183–212.
  • Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysis with Missing Data. New York: J. Wiley & Sons, 1st edition.
  • Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., and Teller, E. (1953). “Equations of State Calculations by Fast Computing Machines.” Journal of Chemical Physics, 21: 1087–1091.
  • Montagna, S., Tokdar, S. T., Neelon, B., and Dunson, D. B. (2012). “Bayesian Latent Factor Regression for Functional and Longitudinal Data.” Biometrics, 68: 1064–1073.
  • Morris, J. S. and Carroll, R. J. (2006). “Wavelet-Based Functional Mixed Models.” Journal of the Royal Statistical Society, Series B, 68: 179–199.
  • Müller, P., Quintana, F., and Rosner, G. L. (2011). “A Product Partition Model With Regression on Covariates.” Journal of Computational and Graphical Statistics, 20(1): 260–277.
  • Neal, R. M. (2000). “Markov Chain Sampling Methods for Dirichlet Process Mixture Models.” Journal of Computational and Graphical Statistics, 9: 249–265.
  • Park, J.-H. and Dunson, D. B. (2010). “Bayesian Generalized Product Partition Model.” Statistica Sinica, 20: 1203–1226.
  • Petrone, S., Guindani, M., and Gelfand, A. (2009). “Hybrid Dirichlet Mixture Models for Functional Data.” Journal or the Royal Statistical Society Series B, 71: 755–782.
  • Quintana, F. A. (2006). “A Predictive View of Bayesian Clustering.” Journal of Statistical Planning and Inference, 136: 2407–2429.
  • Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. New York: Springer, second edition.
  • Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.
  • Telesca, D. and Inoue, L. Y. T. (2008). “Bayesian Hierarchical Curve Registration.” Journal of the American Statistical Association, 103: 328–339.
  • Wang, S., Jank, W., Shmueli, G., and Smith, P. (2008). “Modeling Price Dynamics in eBay Auctions Using Differential Equations.” Journal of the American Statistical Association, 103: 1100–1118.
  • Zhu, B. and Dunson, D. B. (2012). “Stochastic Volatility Regression for Functional Data Dynamics.” arXiv:1212.0181v1 [stat.AP].
  • — (2013). “Locally Adaptive Bayes Nonparametric Regression via Nested Gaussian Processes.” Journal of the American Statistical Association, (504): 1445–1456.
  • Zhu, B., Taylor, J. M. G., and Song, P. X. K. (2011). “Semiparametric Stochastic Modeling of the Rate Function in Longitudinal Studies.” Journal of the American Statistical Association, 106: 1485–1495.