Volume 2 Issue 4 | The Annals of Applied Statistics

The Annals of Applied Statistics

VOL. 2 · NO. 4 | December 2008

RECEIVE NEW CONTENT ALERTS FOR THIS ISSUE

< Previous Issue | Next Issue >

VIEW ALL ABSTRACTS +

Frontmatter

Table of content

Ann. Appl. Stat. 2 (4), (December 2008) Open Access

No abstract available

Editorial Board

Ann. Appl. Stat. 2 (4), (December 2008) Open Access

No abstract available

Special Section on Atmospheric Science

Special section on statistics in the atmospheric sciences

Montserrat Fuentes, Peter Guttorp, Michael L. Stein

Ann. Appl. Stat. 2 (4), 1143-1147, (December 2008) DOI: 10.1214/08-AOAS209 Open Access

KEYWORDS: Meteorology, precipitation forecasting, Markov models, data assimilation, Air pollution, Climate change

No abstract available

Spatial–temporal mesoscale modeling of rainfall intensity using gage and radar data

Montserrat Fuentes, Brian Reich, Gyuwon Lee

Ann. Appl. Stat. 2 (4), 1148-1169, (December 2008) DOI: 10.1214/08-AOAS166 Open Access

KEYWORDS: Conditionally autoregressive models, full symmetry, nonstationarity, rainfall modelling, spatial logistic regression, spatial–temporal models

Read Abstract +

Gridded estimated rainfall intensity values at very high spatial and temporal resolution levels are needed as main inputs for weather prediction models to obtain accurate precipitation forecasts, and to verify the performance of precipitation forecast models. These gridded rainfall fields are also the main driver for hydrological models that forecast flash floods, and they are essential for disaster prediction associated with heavy rain. Rainfall information can be obtained from rain gages that provide relatively accurate estimates of the actual rainfall values at point-referenced locations, but they do not characterize well enough the spatial and temporal structure of the rainfall fields. Doppler radar data offer better spatial and temporal coverage, but Doppler radar measures effective radar reflectivity (Ze) rather than rainfall rate (R). Thus, rainfall estimates from radar data suffer from various uncertainties due to their measuring principle and the conversion from Ze to R. We introduce a framework to combine radar reflectivity and gage data, by writing the different sources of rainfall information in terms of an underlying unobservable spatial temporal process with the true rainfall values. We use spatial logistic regression to model the probability of rain for both sources of data in terms of the latent true rainfall process. We characterize the different sources of bias and error in the gage and radar data and we estimate the true rainfall intensity with its posterior predictive distribution, conditioning on the observed data. Our model allows for nonstationary and asymmetry in the spatio-temporal dependency structure of the rainfall process, and allows the temporal evolution of the rainfall process to depend on the motions of rain fields, and the spatial correlation to depend on geographic features. We apply our methods to estimate rainfall intensity every 10 minutes, in a subdomain over South Korea with a spatial resolution of 1 km by 1 km.

Probabilistic quantitative precipitation field forecasting using a two-stage spatial model

Veronica J. Berrocal, Adrian E. Raftery, Tilmann Gneiting

Ann. Appl. Stat. 2 (4), 1170-1193, (December 2008) DOI: 10.1214/08-AOAS203 Open Access

KEYWORDS: Discrete-continuous distribution, ensemble forecast, gamma distribution, latent Gaussian process, numerical weather prediction, power truncated normal model, probit model, Tobit model

Read Abstract +

Interpreting self-organizing maps through space–time data models

Huiyan Sang, Alan E. Gelfand, Chris Lennard, Gabriele Hegerl, Bruce Hewitson

Ann. Appl. Stat. 2 (4), 1194-1216, (December 2008) DOI: 10.1214/08-AOAS174 Open Access

KEYWORDS: Bivariate spatial predictive process, space–time models, Markov chain Monte Carlo, model choice, vector autoregressive model

Read Abstract +

Self-organizing maps (SOMs) are a technique that has been used with high-dimensional data vectors to develop an archetypal set of states (nodes) that span, in some sense, the high-dimensional space. Noteworthy applications include weather states as described by weather variables over a region and speech patterns as characterized by frequencies in time. The SOM approach is essentially a neural network model that implements a nonlinear projection from a high-dimensional input space to a low-dimensional array of neurons. In the process, it also becomes a clustering technique, assigning to any vector in the high-dimensional data space the node (neuron) to which it is closest (using, say, Euclidean distance) in the data space. The number of nodes is thus equal to the number of clusters. However, the primary use for the SOM is as a representation technique, that is, finding a set of nodes which representatively span the high-dimensional space. These nodes are typically displayed using maps to enable visualization of the continuum of the data space. The technique does not appear to have been discussed in the statistics literature so it is our intent here to bring it to the attention of the community. The technique is implemented algorithmically through a training set of vectors. However, through the introduction of stochasticity in the form of a space–time process model, we seek to illuminate and interpret its performance in the context of application to daily data collection. That is, the observed daily state vectors are viewed as a time series of multivariate process realizations which we try to understand under the dimension reduction achieved by the SOM procedure.

The application we focus on here is to synoptic climatology where the goal is to develop an array of atmospheric states to capture a collection of distinct circulation patterns. In particular, we have daily weather data observed in the form of 11 variables measured for each of 77 grid cells yielding an 847×1 vector for each day. We have such daily vectors for a period of 31 years (11,315 days). Twelve SOM nodes have been obtained by the meteorologists to represent the space of these data vectors. Again, we try to enhance our understanding of dynamic SOM node behavior arising from this dataset.

Parameter estimation for computationally intensive nonlinear regression with an application to climate modeling

Dorin Drignei, Chris E. Forest, Doug Nychka

Ann. Appl. Stat. 2 (4), 1217-1230, (December 2008) DOI: 10.1214/08-AOAS210 Open Access

KEYWORDS: Equilibrium climate sensitivity, observed and modeled climate, space–time modeling, statistical surrogate, temperature data

Read Abstract +

Interpolating fields of carbon monoxide data using a hybrid statistical-physical model

Anders Malmberg, Avelino Arellano, David P. Edwards, Natasha Flyer, Doug Nychka, Christopher Wikle

Ann. Appl. Stat. 2 (4), 1231-1248, (December 2008) DOI: 10.1214/08-AOAS168 Open Access

KEYWORDS: Carbon monoxide, satellite data, Bayesian hierarchical models, interpolation, data assimilation

Read Abstract +

Estimating exposure response functions using ambient pollution concentrations

Gavin Shaddick, Duncan Lee, James V. Zidek, Ruth Salway

Ann. Appl. Stat. 2 (4), 1249-1270, (December 2008) DOI: 10.1214/08-AOAS177 Open Access

KEYWORDS: environmental epidemiology, Air pollution, personal exposure simulator, Bayesian hierarchical models

Read Abstract +

This paper presents an approach to estimating the health effects of an environmental hazard. The approach is general in nature, but is applied here to the case of air pollution. It uses a computer model involving ambient pollution and temperature input to simulate the exposures experienced by individuals in an urban area, while incorporating the mechanisms that determine exposures. The output from the model comprises a set of daily exposures for a sample of individuals from the population of interest. These daily exposures are approximated by parametric distributions so that the predictive exposure distribution of a randomly selected individual can be generated. These distributions are then incorporated into a hierarchical Bayesian framework (with inference using Markov chain Monte Carlo simulation) in order to examine the relationship between short-term changes in exposures and health outcomes, while making allowance for long-term trends, seasonality, the effect of potential confounders and the possibility of ecological bias.

The paper applies this approach to particulate pollution (PM₁₀) and respiratory mortality counts for seniors in greater London (≥65 years) during 1997. Within this substantive epidemiological study, the effects on health of ambient concentrations and (estimated) personal exposures are compared. The proposed model incorporates within day (or between individual) variability in personal exposures, which is compared to the more traditional approach of assuming a single pollution level applies to the entire population for each day. Effects were estimated using single lags and distributed lag models, with the highest relative risk, RR=1.02 (1.01–1.04), being associated with a lag of two days ambient concentrations of PM₁₀. Individual exposures to PM₁₀ for this group (seniors) were lower than the measured ambient concentrations with the corresponding risk, RR=1.05 (1.01–1.09), being higher than would be suggested by the traditional approach using ambient concentrations.

Nonstationary covariance models for global data

Mikyoung Jun, Michael L. Stein

Ann. Appl. Stat. 2 (4), 1271-1289, (December 2008) DOI: 10.1214/08-AOAS183 Open Access

KEYWORDS: Nonstationary covariance function, processes on spheres, TOMS ozone data, fast Fourier transform

Read Abstract +

Articles

New multicategory boosting algorithms based on multicategory Fisher-consistent losses

Hui Zou, Ji Zhu, Trevor Hastie

Ann. Appl. Stat. 2 (4), 1290-1306, (December 2008) DOI: 10.1214/08-AOAS198 Open Access

KEYWORDS: boosting, Fisher-consistent losses, multicategory classification

Read Abstract +

Reconstructing the energy landscape of a distribution from Monte Carlo samples

Qing Zhou, Wing Hung Wong

Ann. Appl. Stat. 2 (4), 1307-1331, (December 2008) DOI: 10.1214/08-AOAS196 Open Access

KEYWORDS: Monte Carlo, cluster tree, sublevel set, connected component, disconnectivity graph, posterior distribution, sequence segmentation, change point

Read Abstract +

Empirical null and false discovery rate inference for exponential families

Armin Schwartzman

Ann. Appl. Stat. 2 (4), 1332-1359, (December 2008) DOI: 10.1214/08-AOAS184 Open Access

KEYWORDS: multiple testing, Multiple comparisons, mixture model, Poisson regression, genome-wide association, brain imaging

Read Abstract +

A weakly informative default prior distribution for logistic and other regression models

Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau, Yu-Sung Su

Ann. Appl. Stat. 2 (4), 1360-1383, (December 2008) DOI: 10.1214/08-AOAS191 Open Access

KEYWORDS: Bayesian inference, generalized linear model, least squares, hierarchical model, Linear regression, logistic regression, multilevel model, noninformative prior distribution, weakly informative prior distribution

Read Abstract +

Modeling long-term longitudinal HIV dynamics with application to an AIDS clinical study

Yangxin Huang, Tao Lu

Ann. Appl. Stat. 2 (4), 1384-1408, (December 2008) DOI: 10.1214/08-AOAS192 Open Access

KEYWORDS: AIDS, antiretroviral drug therapy, Bayesian nonlinear mixed-effects models, time-varying drug efficacy, long-term HIV dynamics, longitudinal data

Read Abstract +

A virologic marker, the number of HIV RNA copies or viral load, is currently used to evaluate antiretroviral (ARV) therapies in AIDS clinical trials. This marker can be used to assess the ARV potency of therapies, but is easily affected by drug exposures, drug resistance and other factors during the long-term treatment evaluation process. HIV dynamic studies have significantly contributed to the understanding of HIV pathogenesis and ARV treatment strategies. However, the models of these studies are used to quantify short-term HIV dynamics (< 1 month), and are not applicable to describe long-term virological response to ARV treatment due to the difficulty of establishing a relationship of antiviral response with multiple treatment factors such as drug exposure and drug susceptibility during long-term treatment. Long-term therapy with ARV agents in HIV-infected patients often results in failure to suppress the viral load. Pharmacokinetics (PK), drug resistance and imperfect adherence to prescribed antiviral drugs are important factors explaining the resurgence of virus. To better understand the factors responsible for the virological failure, this paper develops the mechanism-based nonlinear differential equation models for characterizing long-term viral dynamics with ARV therapy. The models directly incorporate drug concentration, adherence and drug susceptibility into a function of treatment efficacy and, hence, fully integrate virologic, PK, drug adherence and resistance from an AIDS clinical trial into the analysis. A Bayesian nonlinear mixed-effects modeling approach in conjunction with the rescaled version of dynamic differential equations is investigated to estimate dynamic parameters and make inference. In addition, the correlations of baseline factors with estimated dynamic parameters are explored and some biologically meaningful correlation results are presented. Further, the estimated dynamic parameters in patients with virologic success were compared to those in patients with virologic failure and significantly important findings were summarized. These results suggest that viral dynamic parameters may play an important role in understanding HIV pathogenesis, designing new treatment strategies for long-term care of AIDS patients.

A Bayesian framework for estimating vaccine efficacy per infectious contact

Yang Yang, Peter Gilbert, Ira M. Longini, Jr., M. Elizabeth Halloran

Ann. Appl. Stat. 2 (4), 1409-1431, (December 2008) DOI: 10.1214/08-AOAS193 Open Access

KEYWORDS: Vaccine efficacy, Bayesian, MCMC, measurement error, copula

Read Abstract +

Nonparametric spectral analysis with applications to seizure characterization using EEG time series

Li Qin, Yuedong Wang

Ann. Appl. Stat. 2 (4), 1432-1451, (December 2008) DOI: 10.1214/08-AOAS185 Open Access

KEYWORDS: EEG, Epilepsy, GACV, GML, locally stationary process, Permutation test, smoothing parameter, smoothing spline, SS ANOVA

Read Abstract +

A mixture of experts model for rank data with applications in election studies

Isobel Claire Gormley, Thomas Brendan Murphy

Ann. Appl. Stat. 2 (4), 1452-1477, (December 2008) DOI: 10.1214/08-AOAS178 Open Access

KEYWORDS: rank data, Mixture models, generalized linear models, EM algorithm, MM algorithm

Read Abstract +

Bayesian multinomial regression with class-specific predictor selection

Paul Gustafson, Geneviève Lefebvre

Ann. Appl. Stat. 2 (4), 1478-1502, (December 2008) DOI: 10.1214/08-AOAS188 Open Access

KEYWORDS: Bayesian model averaging, ‎classification‎, Markov chain Monte Carlo, multinomial models

Read Abstract +

State-space based mass event-history model I: Many decision-making agents with one target

Hsieh Fushing, Li Zhu, David I. Shapiro-Ilan, James F. Campbell, Edwin E. Lewis

Ann. Appl. Stat. 2 (4), 1503-1522, (December 2008) DOI: 10.1214/08-AOAS189 Open Access

KEYWORDS: Extremists, Heterogeneity, interval censoring, logistic regression, maximum likelihood estimation, Nematode, Parasite infection, Weibull distribution

Read Abstract +

Real time estimation in local polynomial regression, with application to trend-cycle analysis

Tommaso Proietti, Alessandra Luati

Ann. Appl. Stat. 2 (4), 1523-1553, (December 2008) DOI: 10.1214/08-AOAS195 Open Access

KEYWORDS: Henderson filter, trend estimation, Musgrave asymmetric filters

Read Abstract +

Controlled stratification for quantile estimation

Claire Cannamela, Josselin Garnier, Bertrand Iooss

Ann. Appl. Stat. 2 (4), 1554-1580, (December 2008) DOI: 10.1214/08-AOAS186 Open Access

KEYWORDS: Quantile estimation, Monte Carlo methods, variance reduction, computer experiments

Read Abstract +

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS