Open Access
June 2017 Clustering correlated, sparse data streams to estimate a localized housing price index
You Ren, Emily B. Fox, Andrew Bruce
Ann. Appl. Stat. 11(2): 808-839 (June 2017). DOI: 10.1214/17-AOAS1019


Understanding how housing values evolve over time is important to policy makers, consumers and real estate professionals. Existing methods for constructing housing indices are computed at a coarse spatial granularity, such as metropolitan regions, which can mask or distort price dynamics apparent in local markets, such as neighborhoods and census tracts. A challenge in moving to estimates at, for example, the census tract level is the scarcity of spatiotemporally localized house sales observations. Our work aims to address this challenge by leveraging observations from multiple census tracts discovered to have correlated valuation dynamics. Our proposed Bayesian nonparametric approach builds on the framework of latent factor models to enable a flexible, data-driven method for inferring the clustering of correlated census tracts. We explore methods for scalability and parallelizability of computations, yielding a housing valuation index at the level of census tract rather than zip code, and on a monthly basis rather than quarterly. Our analysis is provided on a large Seattle metropolitan housing dataset.


Download Citation

You Ren. Emily B. Fox. Andrew Bruce. "Clustering correlated, sparse data streams to estimate a localized housing price index." Ann. Appl. Stat. 11 (2) 808 - 839, June 2017.


Received: 1 April 2015; Revised: 1 December 2016; Published: June 2017
First available in Project Euclid: 20 July 2017

zbMATH: 06775894
MathSciNet: MR3693548
Digital Object Identifier: 10.1214/17-AOAS1019

Keywords: Bayesian nonparametrics , clustering , housing price index , multiple time series , state space models

Rights: Copyright © 2017 Institute of Mathematical Statistics

Vol.11 • No. 2 • June 2017
Back to Top