Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact email@example.com with any questions.
This paper presents a latent variable representation of regularized support vector machines (SVM's) that enables EM, ECME or MCMC algorithms to provide parameter estimates. We verify our representation by demonstrating that minimizing the SVM optimality criterion together with the parameter regularization penalty is equivalent to finding the mode of a mean-variance mixture of normals pseudo-posterior distribution. The latent variables in the mixture representation lead to EM and ECME point estimates of SVM parameters, as well as MCMC algorithms based on Gibbs sampling that can bring Bayesian tools for Gaussian linear models to bear on SVM's. We show how to implement SVM's with spike-and-slab priors and run them against data from a standard spam filtering data set.
Statistical methods of inference typically require the likelihood function to be computable in a reasonable amount of time. The class of "likelihood-free" methods termed Approximate Bayesian Computation (ABC) is able to eliminate this requirement, replacing the evaluation of the likelihood with simulation from it. Likelihood-free methods have gained in efficiency and popularity in the past few years, following their integration with Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) in order to better explore the parameter space. They have been applied primarily to estimating the parameters of a given model, but can also be used to compare models.
Here we present novel likelihood-free approaches to model comparison, based upon the independent estimation of the evidence of each model under study. Key advantages of these approaches over previous techniques are that they allow the exploitation of MCMC or SMC algorithms for exploring the parameter space, and that they do not require a sampler able to mix between models. We validate the proposed methods using a simple exponential family problem before providing a realistic problem from human population genetics: the comparison of different demographic models based upon genetic data from the Y chromosome.
In functional data analysis (FDA) it is of interest to generalize techniques of multivariate analysis like canonical correlation analysis or regression to functions which are often observed with noise. In the proposed Bayesian approach to FDA two tools are combined: (i) a special Demmler-Reinsch like basis of interpolation splines to represent functions parsimoniously and flexibly; (ii) latent variable models initially introduced for probabilistic principal components analysis or canonical correlation analysis of the corresponding coefficients. In this way partial curves and non-Gaussian measurement error schemes can be handled. Bayesian inference is based on a variational algorithm such that computations are straight forward and fast corresponding to an idea of FDA as a toolbox for explorative data analysis. The performance of the approach is illustrated with synthetic and real data sets.
This work introduces a specific application of Bayesian nonparametric statistics to the food risk analysis framework. The goal was to determine the cocktails of pesticide residues to which the French population is simultaneously exposed through its current diet in order to study their possible combined effects on health through toxicological experiments. To do this, the joint distribution of exposures to a large number of pesticides, which we called the co-exposure distribution, was assessed from the available consumption data and food contamination analyses. We propose modelling the co-exposure using a Dirichlet process mixture based on a multivariate Gaussian kernel so as to determine groups of individuals with similar co-exposure patterns. Posterior distributions and optimal partition were computed through a Gibbs sampler based on stick-breaking priors. The study of the correlation matrix of the sub-population co-exposures will be used to define the cocktails of pesticides to which they are jointly exposed at high doses. To reduce the computational burden due to the high data dimensionality, a random-block sampling approach was used. In addition, we propose to account for the uncertainty of food contamination through the introduction of an additional level of hierarchy in the model. The results of both specifications are described and compared.
We describe a novel class of Bayesian nonparametric priors based on stick-breaking constructions where the weights of the process are constructed as probit transformations of normal random variables. We show that these priors are extremely flexible, allowing us to generate a great variety of models while preserving computational simplicity. Particular emphasis is placed on the construction of rich temporal and spatial processes, which are applied to two problems in finance and ecology.