Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact firstname.lastname@example.org with any questions.
In this paper we develop a likelihood-free simulation methodology in order to obtain Bayesian inference for models for low integer-valued time series data that have computationally demanding likelihood functions. The algorithm fits within the framework of particle Markov chain Monte Carlo (PMCMC) methods and uses a so-called alive particle filter. The particle filter requires only model simulations and, in this regard, our approach has connections with approximate Bayesian computation (ABC). However, an advantage of using the PMCMC approach in this setting is that simulated data can be matched with data observed one-at-a-time, rather than attempting to match on the full dataset simultaneously or on a low-dimensional non-sufficient summary statistic, which is common practice in ABC. For low integer-valued time series data, we find that it is often computationally feasible to match simulated data with observed data exactly. The alive particle filter uses negative binomial sampling in order to maintain a fixed number of particles. The algorithm creates an unbiased estimate of the likelihood, resulting in exact posterior inferences when included in an MCMC algorithm. In cases where exact matching is computationally prohibitive, a tolerance is introduced as in ABC. This paper further develops the alive particle filter by introducing auxiliary variables so that partially observed and/or non-Markovian models can be accommodated. We demonstrate that Bayesian model choice problems involving such models can be handled with this approach. The methodology is illustrated on a wide variety of models for simulated and real low-count time series data involving a rich set of applications.
We present a Bayesian variable selection method based on an extension of the Zellner’s -prior in linear models. More specifically, we propose a two-component -prior, wherein a tuning parameter, calibrated by use of pseudo-variables, is introduced to adjust the distance between the two components. We show that implementing the proposed prior in variable selection is more efficient than using the Zellner’s -prior. Simulation results also indicate that models selected using the method with the two-component -prior are generally more favorable with smaller losses compared to other methods considered in our work. The proposed method is further demonstrated using our motivating gene expression data from a lung disease study, and ozone data analyzed in earlier studies.
Presently, there are few options with available software to perform a fully Bayesian analysis of time-to-event data wherein the hazard is estimated semi- or non-parametrically. One option is the piecewise exponential model, which requires an often unrealistic assumption that the hazard is piecewise constant over time. The primary aim of this paper is to construct a tractable semiparametric alternative to the piecewise exponential model that assumes the hazard is continuous, and to provide modifiable, user-friendly software that allows the use of these methods in a variety of settings. To accomplish this aim, we use a novel model formulation for the log-hazard based on a low-rank thin plate linear spline that readily facilitates adjustment for covariates with time-dependent and proportional hazards effects, possibly subject to shape restrictions. We investigate the performance of our model choices via simulation. We then analyze colorectal cancer data from a clinical trial comparing the effectiveness of two novel treatment regimes relative to the standard of care for overall survival. We estimate a time-dependent hazard ratio for each novel regime relative to the standard of care while adjusting for the effect of aspartate transaminase, a biomarker of liver function, that is subject to a non-decreasing shape restriction.
The study of sums of possibly associated Bernoulli random variables has been hampered by an asymmetry between positive correlation and negative correlation. The Conway–Maxwell-Binomial (CMB) distribution gracefully models both positive and negative association. This distribution has sufficient statistics and a family of proper conjugate distributions. The relationship of this distribution to the exchangeable special case is explored, and two applications are discussed.
Prior distributions are important in Bayesian inference of rare events because historical data information is scarce, and experts are an important source of information for elicitation of a prior distribution. I propose a method to incorporate expert information into nonparametric Bayesian inference on rare events when expert knowledge is elicited as moment conditions on a finite dimensional parameter only. I generalize the Dirichlet process mixture model to merge expert information into the Dirichlet process (DP) prior to satisfy expert’s moment conditions. Among all the priors that comply with expert knowledge, we use the one that is closest to the original DP prior in the Kullback–Leibler information criterion. The resulting prior distribution is given by exponentially tilting the DP prior along . I provide a Metropolis–Hastings algorithm to implement this approach to sample from posterior distributions with exponentially tilted DP priors. The proposed method combines prior information from a statistician and an expert by finding the least-informative prior given expert information.
Bayesian analysis of functions and curves is considered, where warping and other geometrical transformations are often required for meaningful comparisons. The functions and curves of interest are represented using the recently introduced square root velocity function, which enables a warping invariant elastic distance to be calculated in a straightforward manner. We distinguish between various spaces of interest: the original space, the ambient space after standardizing, and the quotient space after removing a group of transformations. Using Gaussian process models in the ambient space and Dirichlet priors for the warping functions, we explore Bayesian inference for curves and functions. Markov chain Monte Carlo algorithms are introduced for simulating from the posterior. We also compare ambient and quotient space estimators for mean shape, and explain their frequent similarity in many practical problems using a Laplace approximation. Simulation studies are carried out, as well as practical alignment of growth rate functions and shape classification of mouse vertebra outlines in evolutionary biology. We also compare the performance of our Bayesian method with some alternative approaches.
By expressing prior distributions as general stochastic processes, nonparametric Bayesian methods provide a flexible way to incorporate prior knowledge and constrain the latent structure in statistical inference. The Indian buffet process (IBP) is such an example that can be used to define a prior distribution on infinite binary features, where the exchangeability among subjects is assumed. The phylogenetic Indian buffet process (pIBP), a derivative of IBP, enables the modeling of non-exchangeability among subjects through a stochastic process on a rooted tree, which is similar to that used in phylogenetics, to describe relationships among the subjects. In this paper, we study the theoretical properties of IBP and pIBP under a binary factor model. We establish the posterior contraction rates for both IBP and pIBP and substantiate the theoretical results through simulation studies. This is the first work addressing the frequentist property of the posterior behaviors of IBP and pIBP. We also demonstrated its practical usefulness by applying pIBP prior to a real data example arising in the field of cancer genomics where the exchangeability among subjects is violated.
We present an approach to incorporating informative prior beliefs about marginal probabilities into Bayesian latent class models for categorical data. The basic idea is to append synthetic observations to the original data such that (i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records. Posterior inferences can be obtained via typical MCMC algorithms for latent class models, tailored to deal efficiently with the missing values in the concatenated data. We illustrate the approach using a variety of simulations based on data from the American Community Survey, including an example of how augmented records can be used to fit latent class models to data from stratified samples.
Rare populations, such as endangered species, drug users and individuals infected by rare diseases, tend to cluster in regions. Adaptive cluster designs are generally applied to obtain information from clustered and sparse populations. The aim of this work is to propose a unit-level mixture model for clustered and sparse populations when the data are obtained from an adaptive cluster sample. Our approach considers heterogeneity among units belonging to different clusters. The proposed model is evaluated using simulated data and a real experiment in which adaptive samples were drawn from an enumeration of a waterfowl species in a 5,000 km2 area of central Florida. The results show that the model is efficient under many settings, even when the level of heterogeneity is low.
Statisticians often use improper priors to express ignorance or to provide good frequency properties, requiring that posterior propriety be verified. This paper addresses generalized linear mixed models, GLMMs, when Level I parameters have Normal distributions, with many commonly-used hyperpriors. It provides easy-to-verify sufficient posterior propriety conditions based on dimensions, matrix ranks, and exponentiated norm bounds, ENBs, for the Level I likelihood. Since many familiar likelihoods have ENBs, which is often verifiable via log-concavity and MLE finiteness, our novel use of ENBs permits unification of posterior propriety results and posterior MGF/moment results for many useful Level I distributions, including those commonly used with multilevel generalized linear models, e.g., GLMMs and hierarchical generalized linear models, HGLMs. Those who need to verify existence of posterior distributions or of posterior MGFs/moments for a multilevel generalized linear model given a proper or improper multivariate prior as in Section 1 should find the required results in Sections 1 and 2 and Theorem 3 (GLMMs), Theorem 4 (HGLMs), or Theorem 5 (posterior MGFs/moments).
The marginal likelihood is a central tool for drawing Bayesian inference about the number of components in mixture models. It is often approximated since the exact form is unavailable. A bias in the approximation may be due to an incomplete exploration by a simulated Markov chain (e.g. a Gibbs sequence) of the collection of posterior modes, a phenomenon also known as lack of label switching, as all possible label permutations must be simulated by a chain in order to converge and hence overcome the bias. In an importance sampling approach, imposing label switching to the importance function results in an exponential increase of the computational cost with the number of components. In this paper, two importance sampling schemes are proposed through choices for the importance function: a maximum likelihood estimate (MLE) proposal and a Rao–Blackwellised importance function. The second scheme is called dual importance sampling. We demonstrate that this dual importance sampling is a valid estimator of the evidence. To reduce the induced high demand in computation, the original importance function is approximated, but a suitable approximation can produce an estimate with the same precision and with less computational workload.
Spatial smoothing is an essential step in the analysis of functional magnetic resonance imaging (fMRI) data. One standard smoothing method is to convolve the image data with a three-dimensional Gaussian kernel that applies a fixed amount of smoothing to the entire image. In pre-surgical brain image analysis where spatial accuracy is paramount, this method, however, is not reasonable as it can blur the boundaries between activated and deactivated regions of the brain. Moreover, while in a standard fMRI analysis strict false positive control is desired, for pre-surgical planning false negatives are of greater concern. To this end, we propose a novel spatially adaptive conditionally autoregressive model with variances in the full conditional of the means that are proportional to error variances, allowing the degree of smoothing to vary across the brain. Additionally, we present a new loss function that allows for the asymmetric treatment of false positives and false negatives. We compare our proposed model with two existing spatially adaptive conditionally autoregressive models. Simulation studies show that our model outperforms these other models; as a real model application, we apply the proposed model to the pre-surgical fMRI data of two patients to assess peri- and intra-tumoral brain activity.