Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact email@example.com with any questions.
A common problem in the analysis of gene expression microarray data is the identification of groups of features that are coherently expressed. For example, one often wishes to know whether a group of genes, clustered because of correlation in one data set, are still highly co-expressed in another data set. Alternatively, for some expression array platforms there are many, relatively short probes for each gene of interest. In this case, it is possible that a given probe is not measuring its targeted gene, but rather a different gene with a similar region (called cross-hybridization). Accurate detection of the collection of probe sets (groups of probes targeting the same gene) which demonstrate highly coherent expression patterns is the best approach to the identification of which genes are present in the sample. We develop a Bayesian Factor Model (BFM) to address the general problem of detection of coherent patterns in gene expression data sets. We compare our method to “state of the art” methods for the identification of expressed genes in both synthetic and real data sets, and the results indicate that the BFM outperforms the other procedures for detecting transcripts. We also demonstrate the use of factor analysis to identify the presence/absence status of gene modules (groups of coherently expressed genes). Variation in the number of copies of regions of the genome is a well known and important feature of most cancers. We examine a group of genes, representative of Copy Number Alteration (CNA) in breast cancer, then identify the presence/absence of CNA in this region of the genome for other cancers. Coherent patterns can also be evaluated in high-throughput sequencing data, a novel technology to measure gene expression. We analyze this type of data via factor model and examine the detection calls in terms of read mapping uncertainty.
In this paper, we provide useful and simple expressions for slope influence diagnostics of several conditional heteroscedastic time series models under innovative model perturbations. These expressions are obtained by establishing a connection between the local influence and residual diagnostics. Monte Carlo experiments provided good results in terms of the size and power of the proposed statistics. To illustrate the results, we analyze the financial time series returns of the S&P500 and DJIA indexes.
Well-known and widely used algorithms for constructing D-optimal designs are Fedorov’s sequential algorithm and Fedorov’s exchange algorithm. In this paper, we modify these two algorithms by adding or exchanging two or more points simultaneously at each step. This will significantly reduce the number of steps needed to construct a D-optimal design. We also prove the convergence of the proposed sequential algorithm to a D-optimal design. Optimal designs for rational regression are used as an illustration.
In this paper, we introduce a first order non-negative integer valued autoregressive process with power series innovations based on the binomial thinning. This new model contains, as particular cases, several models such as the Poisson INAR(1) model (Al-Osh and Alzaid (J. Time Series Anal.8 (1987) 261–275)), the geometric INAR(1) model (Jazi, Jones and Lai (J. Iran. Stat. Soc. (JIRSS)11 (2012) 173–190)) and many others. The main properties of the model are derived, such as mean, variance and the autocorrelation function. Yule–Walker, conditional least squares and conditional maximum likelihood estimators of the model parameters are derived. An extensive Monte Carlo experiment is conducted to evaluate the performances of these estimators in finite samples. Special sub-models are studied in some detail. Applications to two real data sets are given to show the flexibility and potentiality of the new model.
Suppose independent random samples are drawn from $k$ shifted exponential populations with a common location but unequal scale parameters. The problem of estimating the Renyi entropy is considered. The uniformly minimum variance unbiased estimator (UMVUE) is derived. Sufficient conditions for improvement over affine and scale equivariant estimators are obtained. As a consequence, improved estimators over the UMVUE and the maximum likelihood estimator (MLE) are obtained. Further, for the case $k=1$, an estimator that dominates the best affine equivariant estimator is derived. Cases when the location parameter is constrained are also investigated in detail.
We establish a sub-Gaussian lower bound for the transition kernel of the one-dimensional, symmetric Bouchaud trap model, which provides a positive answer to the behavior predicted by Bertin and Bouchaud in (Phys. Rev. E (3)67 (2013) 026128). The proof rests on the Ray–Knight description of the local time of a one-dimensional Brownian motion. Using the same ideas, we also prove the corresponding result for the FIN singular diffusion.
Survival models with univariate frailty may be used when there is no information on covariates that are important to explain the failure time. The lack of information may be with respect to covariates that were not observed or even covariates which for some reason we can not measure, for instance, environmental or genetic factors. In this paper, we extend the generalized time-dependent logistic model proposed by Mackenzie (The Statistician45 (1996) 21–34), by including a frailty term in the modeling. The proposed methodology uses the Laplace transform to find the survival function unconditional on the individual frailty. A simulation study examines the bias, the mean squared errors and the coverage probabilities. Estimation is based on maximum likelihood. A real example on lung cancer illustrates the applicability of the methodology, which is compared to the modeling without frailty via selection criteria to determine which model best fits the data.
In this paper, we propose a hierarchical Bayesian framework with a prior Dirichlet process for gene-by-gene multiple comparison analysis. The comparison among experimental conditions are made using the posterior probability for hypothesis of equality or inequality. To calculate the posterior probabilities, we use the Polya urn scheme through latent variables and the Bayes factor. The performance of the proposed method, as well as a comparison with usual Tukey-test, are evaluated on artificial data and on a shotgun proteomics data set. The results reveal a better performance of the proposed methodology in identification of difference of means and/or variance.