The Annals of Applied Statistics

Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions

Jie Peng and Hans-Georg Müller

Full-text: Open access

Abstract

We propose a distance between two realizations of a random process where for each realization only sparse and irregularly spaced measurements with additional measurement errors are available. Such data occur commonly in longitudinal studies and online trading data. A distance measure then makes it possible to apply distance-based analysis such as classification, clustering and multidimensional scaling for irregularly sampled longitudinal data. Once a suitable distance measure for sparsely sampled longitudinal trajectories has been found, we apply distance-based clustering methods to eBay online auction data. We identify six distinct clusters of bidding patterns. Each of these bidding patterns is found to be associated with a specific chance to obtain the auctioned item at a reasonable price.

Article information

Source
Ann. Appl. Stat., Volume 2, Number 3 (2008), 1056-1077.

Dates
First available in Project Euclid: 13 October 2008

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1223908052

Digital Object Identifier
doi:10.1214/08-AOAS172

Mathematical Reviews number (MathSciNet)
MR2516804

Zentralblatt MATH identifier
1149.62053

Keywords
Bidder trajectory clustering of trajectories functional data analysis metric in function space multidimensional scaling

Citation

Peng, Jie; Müller, Hans-Georg. Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann. Appl. Stat. 2 (2008), no. 3, 1056--1077. doi:10.1214/08-AOAS172. https://projecteuclid.org/euclid.aoas/1223908052


Export citation

References

  • Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York.
  • Bapna, R., Goes, P., Gupta, A. and Jin, Y. (2004). User heterogeneity and its impact on electronic auction market design: An empirical exploration. MIS Quarterly 28 21–43.
  • Cox, T. and Cox, M. (2001). Multidimensional Scaling. Chapman and Hall/CRC, London.
  • Erosheva, E. A. and Fienberg, S. E. (2005). Bayesian mixed membership models for soft clustering and classification. In Classification—The Ubiquitous Challenge (C. Weihs and W. Gaul, eds.) 11–26. Springer, New York.
  • Erosheva, E. A., Fienberg, S. E. and Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. Ann. Appl. Statist. 1 502–537.
  • Erosheva, E. A., Fienberg, S. E. and Lafferty, J. (2004). Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci. 101 5220–5227.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman and Hall/CRC, London.
  • James, G., Hastie, T. G. and Sugar, C. A. (2001). Principal component models for sparse functional data. Biometrika 87 587–602.
  • James, G. and Sugar, C. A. (2003). Clustering for sparsely sampled functional data. J. Amer. Statist. Assoc. 98 397–408.
  • Jank, W. and Shmueli, G. (2006). Functional data analysis in electronic commerce research. Statist. Sci. 21 155–166.
  • Jank, W. and Shmueli, G. (2008). Studying heterogeneity of price evolution in eBay auctions via functional clustering. In Handbook on Information Series: Business Computing (C. Adomavicius and A. Gupta, eds.). Elsevier. To appear.
  • Kearsley, A. J., Tapia, R. A. and Trosset, M. W. (1998). The solution of the metric STRESS and SSTRESS problems in multidimensional scaling using Newton’s method. Comput. Statist. 13 369–396.
  • Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29 1–27.
  • Liu, B. and Müller, H. G. (2008). Functional data analysis for sparse auction data. In Statistical Methods for E-commerce Research (W. Jank and G. Shmueli, eds.). Wiley, New York.
  • Müller, H. G. (2005). Functional modelling and classification of longitudinal data. Scand. J. Statist. 32 223–240.
  • Peng, J. and Müller, H.-G. (2008). Supplement to “Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions.” DOI: 10.1214/08-AOAS172SUPPA; DOI: 10.1214/08-AOAS172SUPPB.
  • Ramsay, J. and Silverman, B. (2005). Functional Data Analysis. Springer, New York.
  • Reithinger, F., Jank, W., Tutz, G. and Shmueli, G. (2008). Smoothing sparse and unevenly sampled curves using semiparametric mixed models: An application to online auctions. J. Roy. Statist. Soc. Ser. C 57 127–148.
  • Rice, J. (2004). Functional and longitudinal data analysis: Perspectives on smoothing. Statist. Sinica 14 631–647.
  • Rice, J. and Silverman, B. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. J. Roy. Statist. Soc. Ser. B 53 233–243.
  • Rice, J. and Wu, C. (2001). Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57 253–259.
  • Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Trans. Computers 18 401–409.
  • Shi, M., Weiss, R. E. and Taylor, J. M. G. (1996). An analysis of paediatric CD4 counts for Acquired Immune Deficiency Syndrome using flexible random curves. Appl. Statist. 45 151–163.
  • Shmueli, G. and Jank, W. (2005). Visualizing online auctions. J. Comput. Graph. Statist. 14 299–319.
  • Shmueli, G., Russo, R. P. and Jank, W. (2007). The BARISTA: A model for bid arrivals in online auctions. Ann. Appl. Statist. 1 412–441.
  • Takane, Y., Young, F. W. and DeLeeuw, J. (1977). Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features. Psychometrika 42 7–67.
  • Yao, F., Müller, H. G., Clifford, A. J., Dueker, S. R., Follett, J., Lin, Y., Buchholz, B. and Vogel, J. S. (2003). Shrinkage estimation for functional principal component scores, with application to the population kinetics of plasma folate. Biometrics 59 676–685.
  • Yao, F., Müller, H. G. and Wang, J. L. (2005). Functional data analysis for sparse longitudinal data. J. Amer. Statist. Assoc. 100 577–590.
  • Zhao, X., Marron, J. S. and Wells, M. T. (2004). The functional data analysis view of longitudinal data. Statist. Sinica 14 789–808.

Supplemental materials