The Annals of Applied Statistics

Bottom-up estimation and top-down prediction: Solar energy prediction combining information from multiple sources

Youngdeok Hwang, Siyuan Lu, and Jae-Kwang Kim

Full-text: Open access


Accurately forecasting solar power using the data from multiple sources is an important but challenging problem. Our goal is to combine two different physics model forecasting outputs with real measurements from an automated monitoring network so as to better predict solar power in a timely manner. To this end, we propose a new approach of analyzing large-scale multilevel models with great computational efficiency requiring minimum monitoring and intervention. This approach features a division of the large scale data set into smaller ones with manageable sizes, based on their physical locations, and fit a local model in each area. The local model estimates are then combined sequentially from the specified multilevel models using our novel bottom-up approach for parameter estimation. The prediction, on the other hand, is implemented in a top-down matter. The proposed method is applied to the solar energy prediction problem for the U.S. Department of Energy’s SunShot Initiative.

Article information

Ann. Appl. Stat., Volume 12, Number 4 (2018), 2096-2120.

Received: December 2016
Revised: January 2018
First available in Project Euclid: 13 November 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Big data prediction large-scale monitoring data multilevel model physics models solar power generation


Hwang, Youngdeok; Lu, Siyuan; Kim, Jae-Kwang. Bottom-up estimation and top-down prediction: Solar energy prediction combining information from multiple sources. Ann. Appl. Stat. 12 (2018), no. 4, 2096--2120. doi:10.1214/18-AOAS1145.

Export citation


  • Bates, D. M. and Pinheiro, J. C. (1998). Computational methods for multilevel modelling. Univ. Wisconsin, Madison, WI.
  • Bayarri, M. J., Berger, J. O., Paulo, R., Sacks, J., Cafeo, J. A., Cavendish, J., Lin, C.-H. and Tu, J. (2007). A framework for validation of computer models. Technometrics 49 138–154.
  • Chatterjee, S., Lahiri, P. and Li, H. (2008). Parametric bootstrap approximation to the distribution of EBLUP and related prediction intervals in linear mixed models. Ann. Statist. 36 1221–1245.
  • Chen, X. and Xie, M. (2014). A split-and-conquer approach for analysis of extraordinarily large data. Statist. Sinica 24 1655–1684.
  • Chu, Y., Pedro, H. T. C., Nonnenmacher, L., Inman, R. H., Liao, Z. and Coimbra, C. F. M. (2014). A smart image-based cloud detection system for intrahour solar irradiance forecasts. J. Atmos. Ocean. Technol. 31 1995–2007.
  • Denholm, P. and Margolis, R. M. (2007). Evaluating the limits of solar photovoltaics (PV) in traditional electric power systems. Energy Policy 35 2852–2861.
  • Du, J. and Tracton, M. S. (2001). Implementation of a real-time shortrange ensemble forecasting system at NCEP: An update. In Ninth Conference on Mesoscale Processes. American Meteorological Society, Fort Lauderdale, FL.
  • Ela, E., Milligan, M. and Kirby, B. (2011). Operating reserves and variable generation. NREL/TP-5500-51978. Available at
  • Gelman, A. (2006). Multilevel (hierarchical) modeling: What it can and cannot do. Technometrics 48 432–435.
  • Gelman, A., Vehtari, A., Jylänki, P., Robert, C., Chopin, N. and Cunningham, J. P. (2014). Expectation propagation as a way of life. Preprint. Available at arXiv:1412.4869.
  • Goldstein, H. (1986). Multilevel mixed linear model analysis using iterative generalized least squares. Biometrika 73 43–56.
  • Gramacy, R. B. (2016). laGP: Large-scale spatial modeling via local approximate Gaussian processes in R. J. Stat. Softw. 72 1–46.
  • Gramacy, R. B. and Apley, D. W. (2015). Local Gaussian process approximation for large computer experiments. J. Comput. Graph. Statist. 24 561–578.
  • Gramacy, R. B. and Haaland, B. (2016). Speeding up neighborhood search in local Gaussian process prediction. Technometrics 58 294–303.
  • Gramacy, R. B., Bingham, D., Holloway, J. P., Grosskopf, M. J., Kuranz, C. C., Rutter, E., Trantham, M. and Drake, P. R. (2015). Calibrating a large computer experiment simulating radiative shock hydrodynamics. Ann. Appl. Stat. 9 1141–1168.
  • Hall, P. and Maiti, T. (2006). On parametric bootstrap methods for small area prediction. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 221–238.
  • Hammer, A., Heinemann, D., Lorenz, E. and Luckehe, B. (1999). Short-term forecasting of solar radiation: A statistical approach using satellite data. Sol. Energy 67 139–150.
  • Hersbach, H. (2000). Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast. 15 559–570.
  • Jiang, H., Schörgendorfer, A., Hwang, Y. and Amemiya, Y. (2015). A practical approach to spatio-temporal analysis. Statist. Sinica 25 369–384.
  • Kim, J. K. (2011). Parametric fractional imputation for missing data analysis. Biometrika 98 119–132.
  • Klein, L. J., Marianno, F. J., Albrecht, C. M., Freitag, M., Lu, S., Hinds, N., Shao, X., Rodriguez, S. B. and Hamann, H. F. (2015). PAIRS: A scalable geo-spatial data analytics platform. In 2015 IEEE International Conference on Big Data (Big Data) 1290–1298.
  • Krüger, F., Lerch, S., Thorarinsdottir, T. L. and Gneiting, T. (2016). Probabilistic forecasting and comparative model assessment based on Markov chain Monte Carlo output. Preprint. Available at arXiv:1608.06802.
  • Lange, K. L., Little, R. J. A. and Taylor, J. M. G. (1989). Robust statistical modeling using the $t$ distribution. J. Amer. Statist. Assoc. 84 881–896.
  • Liu, F., Bayarri, M. J. and Berger, J. O. (2009). Modularization in Bayesian analysis, with emphasis on analysis of computer models. Bayesian Anal. 4 119–150.
  • Liu, X., Yeo, K., Hwang, Y., Singh, J. and Kalagnanam, J. (2016). A statistical modeling approach for air quality data based on physical dispersion processes and its application to ozone modeling. Ann. Appl. Stat. 10 756–785.
  • Margolis, R., Coggeshall, C. and Zuboy, J. (2012). Integration of solar into the U.S. electric power system. In SunShot Vision Study U.S. Department of Energy, Washington, DC.
  • Marquez, R. and Coimbra, C. F. M. (2013). Intra-hour DNI forecasting based on cloud tracking image analysis. Sol. Energy 91 327–336.
  • Mathiesen, P., Collier, C. and Kleissl, J. (2013). A high-resolution, cloud-assimilating numerical weather prediction model for solar irradiance forecasting. Sol. Energy 92 47–61.
  • Mathiesen, P. and Kleissl, J. (2011). Evaluation of numerical weather prediction for intra-day solar forecasting in the continental United States. Sol. Energy 85 967–977.
  • Orwig, K., Ahlstrom, M., Banunarayanan, V., Sharp, J., Wilczak, J., Freedman, J., Haupt, S., Cline, J., Bartholomy, O., Hamann, H., Hodge, B., Finley, C., Nakafuji, D., Peterson, J., Maggio, D. and Marquis, M. (2015). Recent trends in variable generation forecasting and its value to the power system. IEEE Trans. Sustain. Energy 6 924–933.
  • Pelland, S., Galanis, G. and Kallos, G. (2013). Solar and photovoltaic forecasting through post-processing of the global environmental multiscale numerical weather prediction model. Prog. Photovolt. 21 284–296.
  • Perez, R., Ineichen, P., Moore, K., Kmiecik, M., Chain, C., George, R. and Vignola, F. (2002). A new operational model for satellite-derived irradiances: Description and validation. Sol. Energy 73 307–317.
  • Perez, R., Lorenz, E., Pelland, S., Beauharnois, M., Knowe, G. V., Hemker Jr., K., Heinemann, D., Remund, J., Muller, S. C., Traunmuller, W., Steinmauer, G., Pozo, D., Ruiz-Arias, J. A., Lara-Fanego, V., Ramirez-Santigosa, L., Gaston-Romero, M. and Pomares, L. M. (2013). Comparison of numerical weather prediction solar irradiance forecasts in the US, Canada and Europe. Sol. Energy 94 305–326.
  • Pratola, M. T., Chipman, H. A., Gattiker, J. R., Higdon, D. M., McCulloch, R. and Rust, W. N. (2014). Parallel Bayesian additive regression trees. J. Comput. Graph. Statist. 23 830–852.
  • Qian, P. Z. G. and Wu, C. F. J. (2008). Bayesian hierarchical modeling for integrating low-accuracy and high-accuracy experiments. Technometrics 50 192–204.
  • Santner, T. J., Williams, B. J. and Notz, W. I. (2003). The Design and Analysis of Computer Experiments. Springer, New York.
  • Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I. and McCulloch, R. E. (2016). Bayes and big data: The consensus Monte Carlo algorithm. Int. J. Manag. Sci. Eng. Manag. 11 78–88.
  • Skamarock, W. C., Klemp, J. B., Dudhia, J., Gill, D. O., Barker, D. M., Duda, M. G., Huang, X.-Y., Wang, W. and Powers, J. G. (2008). A description of the advanced research WRF version 3. NCAR Technical Note: NCAR/TN-475$+$STR, National Center for Atmospheric Research, Boulder, CO.
  • Soto, W. D., Klein, S. A. and Beckman, W. A. (2006). Improvement and validation of a model for photovoltaic array performance. Sol. Energy 80 78–88.
  • Welch, W. J., Buch, R. J., Sacks, J., Wynn, H. P., Mitchell, T. J. and Morris, M. D. (1992). Screening, predicting and computer experiments. Technometrics 34 15–25.
  • Wong, G. Y. and Mason, W. M. (1985). The hierarchical logistic regression model for multilevel analysis. J. Amer. Statist. Assoc. 80 513–524.
  • Wong, R. K. W., Storlie, C. B. and Lee, T. C. M. (2017). A frequentist approach to computer model calibration. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 635–648.
  • Wu, C. F. J. (2015). Post-Fisherian experimentation: From physical to virtual. J. Amer. Statist. Assoc. 110 612–620.
  • Zhang, J., Hodge, B.-M., Lu, S., Hamann, H. F., Lehman, B., Simmons, J., Campos, E., Banunarayanan, V., Black, J. and Tedesco, J. (2015a). Baseline and target values for regional and point PV power forecasts: Toward improved solar forecasting. Sol. Energy 122 804–819.
  • Zhang, J., Florita, A., Hodge, B.-M., Lu, S., Hamann, H. F., Banunarayanan, V. and Brockway, A. M. (2015b). A suite of metrics for assessing the performance of solar power forecasting. Sol. Energy 111 157–175.