The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 11, Number 2 (2017), 808-839.
Clustering correlated, sparse data streams to estimate a localized housing price index
Understanding how housing values evolve over time is important to policy makers, consumers and real estate professionals. Existing methods for constructing housing indices are computed at a coarse spatial granularity, such as metropolitan regions, which can mask or distort price dynamics apparent in local markets, such as neighborhoods and census tracts. A challenge in moving to estimates at, for example, the census tract level is the scarcity of spatiotemporally localized house sales observations. Our work aims to address this challenge by leveraging observations from multiple census tracts discovered to have correlated valuation dynamics. Our proposed Bayesian nonparametric approach builds on the framework of latent factor models to enable a flexible, data-driven method for inferring the clustering of correlated census tracts. We explore methods for scalability and parallelizability of computations, yielding a housing valuation index at the level of census tract rather than zip code, and on a monthly basis rather than quarterly. Our analysis is provided on a large Seattle metropolitan housing dataset.
Ann. Appl. Stat., Volume 11, Number 2 (2017), 808-839.
Received: April 2015
Revised: December 2016
First available in Project Euclid: 20 July 2017
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Ren, You; Fox, Emily B.; Bruce, Andrew. Clustering correlated, sparse data streams to estimate a localized housing price index. Ann. Appl. Stat. 11 (2017), no. 2, 808--839. doi:10.1214/17-AOAS1019. https://projecteuclid.org/euclid.aoas/1500537725
- Supplement to “Clustering correlated, sparse data streams to estimate a localized housing price index”. We detail aspects of our MCMC sampler, including: (i) the required likelihood calculation via Kalman filtering variants and (ii) a parallel implementation of sampling the cluster memberships. We also include further synthetic data experiments and results from our Seattle City analysis, and specify the various settings used in our experiments. Finally, we provide additional details on our model selection, specification, and computations for the joint global trend analysis. A link to our code base and related housing data sources is included.