The Annals of Statistics

Generalized random forests

Susan Athey, Julie Tibshirani, and Stefan Wager

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We propose generalized random forests, a method for nonparametric statistical estimation based on random forests (Breiman [Mach. Learn. 45 (2001) 5–32]) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method considers a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: nonparametric quantile regression, conditional average partial effect estimation and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN.

Article information

Source
Ann. Statist., Volume 47, Number 2 (2019), 1148-1178.

Dates
Received: July 2017
Revised: April 2018
First available in Project Euclid: 11 January 2019

Permanent link to this document
https://projecteuclid.org/euclid.aos/1547197251

Digital Object Identifier
doi:10.1214/18-AOS1709

Mathematical Reviews number (MathSciNet)
MR3909963

Zentralblatt MATH identifier
07033164

Subjects
Primary: 62G05: Estimation

Keywords
Asymptotic theory causal inference instrumental variable

Citation

Athey, Susan; Tibshirani, Julie; Wager, Stefan. Generalized random forests. Ann. Statist. 47 (2019), no. 2, 1148--1178. doi:10.1214/18-AOS1709. https://projecteuclid.org/euclid.aos/1547197251


Export citation

References

  • Abadie, A. (2003). Semiparametric instrumental variable estimation of treatment response models. J. Econometrics 113 231–263.
  • Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Comput. 9 1545–1588.
  • Andrews, D. W. K. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica 61 821–856.
  • Angrist, J. D. (1990). Lifetime earnings and the Vietnam era draft lottery: Evidence from social security administrative records. AER 313–336.
  • Angrist, J. D. and Evans, W. N. (1998). Children and their parents’ labor supply: Evidence from exogenous variation in family size. AER 450–477.
  • Arlot, S. and Genuer, R. (2014). Analysis of purely random forests bias. ArXiv preprint. Available at arXiv:1407.3939.
  • Athey, S. and Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. USA 113 7353–7360.
  • Athey, S., Tibshirani, J. and Wager, S. (2018). Supplement to “Generalized random forests.” DOI:10.1214/18-AOS1709SUPP.
  • Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 2369–2429.
  • Beygelzimer, A. and Langford, J. (2009). The offset tree for learning with partial labels. In Proceedings of KDD 129–138. ACM.
  • Biau, G. (2012). Analysis of a random forests model. J. Mach. Learn. Res. 13 1063–1095.
  • Biau, G. and Devroye, L. (2010). On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. J. Multivariate Anal. 101 2499–2518.
  • Biau, G., Devroye, L. and Lugosi, G. (2008). Consistency of random forests and other averaging classifiers. J. Mach. Learn. Res. 9 2015–2033.
  • Biau, G. and Scornet, E. (2016). A random forest guided tour. TEST 25 197–227.
  • Breiman, L. (1996). Bagging predictors. Mach. Learn. 24 123–140.
  • Breiman, L. (2001). Random forests. Mach. Learn. 45 5–32.
  • Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Advanced Books and Software, Belmont, CA.
  • Bühlmann, P. and Yu, B. (2002). Analyzing bagging. Ann. Statist. 30 927–961.
  • Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econom. J. 21 C1–C68.
  • Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). BART: Bayesian additive regression trees. Ann. Appl. Stat. 4 266–298.
  • Darolles, S., Fan, Y., Florens, J. P. and Renault, E. (2011). Nonparametric instrumental regression. Econometrica 79 1541–1565.
  • Denil, M., Matheson, D. and De Freitas, N. (2014). Narrowing the Gap: Random forests in theory and in practice. In Proceedings of ICML 665–673.
  • Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn. 40 139–157.
  • Efron, B. (1982). The Jackknife, the Bootstrap and Other Resampling Plans. CBMS-NSF Regional Conference Series in Applied Mathematics 38. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.
  • Efron, B. and Stein, C. (1981). The jackknife estimate of variance. Ann. Statist. 9 586–596.
  • Fan, J., Farmen, M. and Gijbels, I. (1998). Local maximum likelihood estimation and inference. J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 591–608.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Monographs on Statistics and Applied Probability 66. Chapman & Hall, London.
  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189–1232.
  • Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.
  • Geurts, P., Ernst, D. and Wehenkel, L. (2006). Extremely randomized trees. Mach. Learn. 63 3–42.
  • Gordon, L. and Olshen, R. A. (1985). Tree-structured survival analysis. Cancer Treat. Rep. 69 1065–1069.
  • Hampel, F. R. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc. 69 383–393.
  • Hansen, B. E. (1992). Testing for parameter instability in linear models. J. Policy Model. 14 517–533.
  • Hartford, J., Lewis, G., Leyton-Brown, K. and Taddy, M. (2017). Deep IV: A flexible approach for counterfactual prediction. In Proceedings of ICML 1414–1423.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning, 2nd ed. Springer, New York.
  • Hill, J. L. (2011). Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Statist. 20 217–240.
  • Hjort, N. L. and Koning, A. (2002). Tests for constancy of model parameters over time. J. Nonparametr. Stat. 14 113–132.
  • Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20 832–844.
  • Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 19 293–325.
  • Honoré, B. E. and Kyriazidou, E. (2000). Panel data discrete choice models with lagged dependent variables. Econometrica 68 839–874.
  • Hothorn, T., Lausen, B., Benner, A. and Radespiel-Tröger, M. (2004). Bagging survival trees. Stat. Med. 23 77–91.
  • Imbens, G. W. and Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica 62 467–475.
  • Ishwaran, H. and Kogalur, U. B. (2010). Consistency of random survival forests. Statist. Probab. Lett. 80 1056–1064.
  • Kallus, N. (2017). Recursive Partitioning for Personalization using Observational Data. In Proceedings of ICML. 1789–1798.
  • Kleiber, C. and Zeileis, A. (2008). Applied Econometrics with R. Springer Science & Business Media.
  • LeBlanc, M. and Crowley, J. (1992). Relative risk trees for censored survival data. Biometrics 411–425.
  • Lewbel, A. (2007). A local generalized method of moments estimator. Econom. Lett. 94 124–128.
  • Lin, Y. and Jeon, Y. (2006). Random forests and adaptive nearest neighbors. J. Amer. Statist. Assoc. 101 578–590.
  • Loader, C. (1999). Local Regression and Likelihood. Springer, New York.
  • Mallows, C. L. (1973). Some comments on Cp. Technometrics 15 661–675.
  • Meinshausen, N. (2006). Quantile regression forests. J. Mach. Learn. Res. 7 983–999.
  • Mentch, L. and Hooker, G. (2016). Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J. Mach. Learn. Res. 17 26.
  • Molinaro, A. M., Dudoit, S. and van der Laan, M. J. (2004). Tree-based multivariate regression and density estimation with right-censored data. J. Multivariate Anal. 90 154–177.
  • Newey, W. K. (1994a). Kernel estimation of partial means and a general variance estimator. Econometric Theory 10 233–253.
  • Newey, W. K. (1994b). The asymptotic variance of semiparametric estimators. Econometrica 62 1349–1382.
  • Newey, W. K. and Powell, J. L. (2003). Instrumental variable estimation of nonparametric models. Econometrica 71 1565–1578.
  • Neyman, J. (1979). $C(\alpha)$ tests and their use. Sankhya, Ser. A 41 1–21.
  • Nyblom, J. (1989). Testing for the constancy of parameters over time. J. Amer. Statist. Assoc. 84 223–230.
  • Ploberger, W. and Krämer, W. (1992). The CUSUM test with OLS residuals. Econometrica 60 271–285.
  • Poterba, J. M., Venti, S. F. and Wise, D. A. (1996). How retirement saving programs increase saving. J. Electron. Publ. 10 91–112.
  • Robins, J. M. and Ritov, Y. (1997). Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. Stat. Med. 16.
  • Robinson, P. M. (1988). Root-$N$-consistent semiparametric regression. Econometrica 56 931–954.
  • Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
  • Schick, A. (1986). On asymptotically efficient estimation in semiparametric models. Ann. Statist. 14 1139–1151.
  • Scornet, E., Biau, G. and Vert, J.-P. (2015). Consistency of random forests. Ann. Statist. 43 1716–1741.
  • Sexton, J. and Laake, P. (2009). Standard errors for bagged and random forest estimators. Comput. Statist. Data Anal. 53 801–811.
  • Staniswalis, J. G. (1989). The kernel estimate of a regression function in likelihood-based models. J. Amer. Statist. Assoc. 84 276–283.
  • Stone, C. J. (1977). Consistent nonparametric regression. Ann. Statist. 5 595–645.
  • Su, L., Murtazashvili, I. and Ullah, A. (2013). Local linear GMM estimation of functional coefficient IV models with an application to estimating the rate of return to schooling. J. Bus. Econom. Statist. 31 184–207.
  • Su, X., Tsai, C.-L., Wang, H., Nickerson, D. M. and Li, B. (2009). Subgroup analysis via recursive partitioning. J. Mach. Learn. Res. 10 141–158.
  • Tibshirani, R. and Hastie, T. (1987). Local likelihood estimation. J. Amer. Statist. Assoc. 82 559–567.
  • van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
  • Varian, H. R. (2014). Big data: New tricks for econometrics. J. Electron. Publ. 28 3–27.
  • Wager, S. and Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. J. Amer. Statist. Assoc. 113 1228–1242.
  • Wager, S., Hastie, T. and Efron, B. (2014). Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. J. Mach. Learn. Res. 15 1625–1651.
  • Wager, S. and Walther, G. (2015). Adaptive concentration of regression trees, with application to random forests. ArXiv preprint. Available at arXiv:1503.06388.
  • Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data, 2nd ed. MIT Press, Cambridge, MA.
  • Wright, M. N. and Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C${+}{+}$ and R. J. Stat. Softw. 77 1–17.
  • Zeileis, A. (2005). A unified approach to structural change tests based on ML scores, $F$ statistics, and OLS residuals. Econometric Rev. 24 445–466.
  • Zeileis, A. and Hornik, K. (2007). Generalized $M$-fluctuation tests for parameter instability. Stat. Neerl. 61 488–508.
  • Zeileis, A., Hothorn, T. and Hornik, K. (2008). Model-based recursive partitioning. J. Comput. Graph. Statist. 17 492–514.
  • Zhu, R., Zeng, D. and Kosorok, M. R. (2015). Reinforcement learning trees. J. Amer. Statist. Assoc. 110 1770–1784.

Supplemental materials

  • Supplement to “Generalized random forests”. The supplement ontains proofs of technical results, as well as a simulation study for instrumental variables regression with forests.