The Annals of Applied Statistics

Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions

Jie Peng and Hans-Georg Müller

Full-text: Open access


We propose a distance between two realizations of a random process where for each realization only sparse and irregularly spaced measurements with additional measurement errors are available. Such data occur commonly in longitudinal studies and online trading data. A distance measure then makes it possible to apply distance-based analysis such as classification, clustering and multidimensional scaling for irregularly sampled longitudinal data. Once a suitable distance measure for sparsely sampled longitudinal trajectories has been found, we apply distance-based clustering methods to eBay online auction data. We identify six distinct clusters of bidding patterns. Each of these bidding patterns is found to be associated with a specific chance to obtain the auctioned item at a reasonable price.

Article information

Ann. Appl. Stat., Volume 2, Number 3 (2008), 1056-1077.

First available in Project Euclid: 13 October 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bidder trajectory clustering of trajectories functional data analysis metric in function space multidimensional scaling


Peng, Jie; Müller, Hans-Georg. Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann. Appl. Stat. 2 (2008), no. 3, 1056--1077. doi:10.1214/08-AOAS172.

Export citation


  • Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York.
  • Bapna, R., Goes, P., Gupta, A. and Jin, Y. (2004). User heterogeneity and its impact on electronic auction market design: An empirical exploration. MIS Quarterly 28 21–43.
  • Cox, T. and Cox, M. (2001). Multidimensional Scaling. Chapman and Hall/CRC, London.
  • Erosheva, E. A. and Fienberg, S. E. (2005). Bayesian mixed membership models for soft clustering and classification. In Classification—The Ubiquitous Challenge (C. Weihs and W. Gaul, eds.) 11–26. Springer, New York.
  • Erosheva, E. A., Fienberg, S. E. and Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. Ann. Appl. Statist. 1 502–537.
  • Erosheva, E. A., Fienberg, S. E. and Lafferty, J. (2004). Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci. 101 5220–5227.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman and Hall/CRC, London.
  • James, G., Hastie, T. G. and Sugar, C. A. (2001). Principal component models for sparse functional data. Biometrika 87 587–602.
  • James, G. and Sugar, C. A. (2003). Clustering for sparsely sampled functional data. J. Amer. Statist. Assoc. 98 397–408.
  • Jank, W. and Shmueli, G. (2006). Functional data analysis in electronic commerce research. Statist. Sci. 21 155–166.
  • Jank, W. and Shmueli, G. (2008). Studying heterogeneity of price evolution in eBay auctions via functional clustering. In Handbook on Information Series: Business Computing (C. Adomavicius and A. Gupta, eds.). Elsevier. To appear.
  • Kearsley, A. J., Tapia, R. A. and Trosset, M. W. (1998). The solution of the metric STRESS and SSTRESS problems in multidimensional scaling using Newton’s method. Comput. Statist. 13 369–396.
  • Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29 1–27.
  • Liu, B. and Müller, H. G. (2008). Functional data analysis for sparse auction data. In Statistical Methods for E-commerce Research (W. Jank and G. Shmueli, eds.). Wiley, New York.
  • Müller, H. G. (2005). Functional modelling and classification of longitudinal data. Scand. J. Statist. 32 223–240.
  • Peng, J. and Müller, H.-G. (2008). Supplement to “Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions.” DOI: 10.1214/08-AOAS172SUPPA; DOI: 10.1214/08-AOAS172SUPPB.
  • Ramsay, J. and Silverman, B. (2005). Functional Data Analysis. Springer, New York.
  • Reithinger, F., Jank, W., Tutz, G. and Shmueli, G. (2008). Smoothing sparse and unevenly sampled curves using semiparametric mixed models: An application to online auctions. J. Roy. Statist. Soc. Ser. C 57 127–148.
  • Rice, J. (2004). Functional and longitudinal data analysis: Perspectives on smoothing. Statist. Sinica 14 631–647.
  • Rice, J. and Silverman, B. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. J. Roy. Statist. Soc. Ser. B 53 233–243.
  • Rice, J. and Wu, C. (2001). Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57 253–259.
  • Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Trans. Computers 18 401–409.
  • Shi, M., Weiss, R. E. and Taylor, J. M. G. (1996). An analysis of paediatric CD4 counts for Acquired Immune Deficiency Syndrome using flexible random curves. Appl. Statist. 45 151–163.
  • Shmueli, G. and Jank, W. (2005). Visualizing online auctions. J. Comput. Graph. Statist. 14 299–319.
  • Shmueli, G., Russo, R. P. and Jank, W. (2007). The BARISTA: A model for bid arrivals in online auctions. Ann. Appl. Statist. 1 412–441.
  • Takane, Y., Young, F. W. and DeLeeuw, J. (1977). Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features. Psychometrika 42 7–67.
  • Yao, F., Müller, H. G., Clifford, A. J., Dueker, S. R., Follett, J., Lin, Y., Buchholz, B. and Vogel, J. S. (2003). Shrinkage estimation for functional principal component scores, with application to the population kinetics of plasma folate. Biometrics 59 676–685.
  • Yao, F., Müller, H. G. and Wang, J. L. (2005). Functional data analysis for sparse longitudinal data. J. Amer. Statist. Assoc. 100 577–590.
  • Zhao, X., Marron, J. S. and Wells, M. T. (2004). The functional data analysis view of longitudinal data. Statist. Sinica 14 789–808.

Supplemental materials