Statistics Surveys Articles (Project Euclid)
http://projecteuclid.org/euclid.ssu
The latest articles from Statistics Surveys on Project Euclid, a site for mathematics and statistics resources.en-usCopyright 2010 Cornell University LibraryEuclid-L@cornell.edu (Project Euclid Team)Thu, 05 Aug 2010 15:41 EDTThu, 14 Apr 2011 08:17 EDThttp://projecteuclid.org/collection/euclid/images/logo_linking_100.gifProject Euclid
http://projecteuclid.org/
Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules
http://projecteuclid.org/euclid.ssu/1266847666
<strong>Michael P. Fay</strong>, <strong>Michael A. Proschan</strong><p><strong>Source: </strong>Statist. Surv., Volume 4, 1--39.</p><p><strong>Abstract:</strong><br/>
In a mathematical approach to hypothesis tests, we start with a clearly defined set of hypotheses and choose the test with the best properties for those hypotheses. In practice, we often start with less precise hypotheses. For example, often a researcher wants to know which of two groups generally has the larger responses, and either a t-test or a Wilcoxon-Mann-Whitney (WMW) test could be acceptable. Although both t-tests and WMW tests are usually associated with quite different hypotheses, the decision rule and p-value from either test could be associated with many different sets of assumptions, which we call perspectives. It is useful to have many of the different perspectives to which a decision rule may be applied collected in one place, since each perspective allows a different interpretation of the associated p-value. Here we collect many such perspectives for the two-sample t-test, the WMW test and other related tests. We discuss validity and consistency under each perspective and discuss recommendations between the tests in light of these many different perspectives. Finally, we briefly discuss a decision rule for testing genetic neutrality where knowledge of the many perspectives is vital to the proper interpretation of the decision rule.
</p>projecteuclid.org/euclid.ssu/1266847666_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTA survey of cross-validation procedures for model selection
http://projecteuclid.org/euclid.ssu/1268143839
<strong>Sylvain Arlot</strong>, <strong>Alain Celisse</strong><p><strong>Source: </strong>Statist. Surv., Volume 4, 40--79.</p><p><strong>Abstract:</strong><br/>
Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its (apparent) universality. Many results exist on model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.
</p>projecteuclid.org/euclid.ssu/1268143839_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTFinite mixture models and model-based clustering
http://projecteuclid.org/euclid.ssu/1272547280
<strong>Volodymyr Melnykov</strong>, <strong>Ranjan Maitra</strong><p><strong>Source: </strong>Statist. Surv., Volume 4, 80--116.</p><p><strong>Abstract:</strong><br/>
Finite mixture models have a long history in statistics, having been used to model population heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classification. This paper provides a detailed review into mixture models and model-based clustering. Recent trends as well as open problems in the area are also discussed.
</p>projecteuclid.org/euclid.ssu/1272547280_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTDiscrete variations of the fractional Brownian motion in the presence of outliers and an additive noise
http://projecteuclid.org/euclid.ssu/1276260873
<strong>Sophie Achard</strong>, <strong>Jean-François Coeurjolly</strong><p><strong>Source: </strong>Statist. Surv., Volume 4, 117--147.</p><p><strong>Abstract:</strong><br/>
This paper gives an overview of the problem of estimating the Hurst parameter of a fractional Brownian motion when the data are observed with outliers and/or with an additive noise by using methods based on discrete variations. We show that the classical estimation procedure based on the log-linearity of the variogram of dilated series is made more robust to outliers and/or an additive noise by considering sample quantiles and trimmed means of the squared series or differences of empirical variances. These different procedures are compared and discussed through a large simulation study and are implemented in the R package dvfBm.
</p>projecteuclid.org/euclid.ssu/1276260873_Thu, 05 Aug 2010 15:41 EDTThu, 05 Aug 2010 15:41 EDTPrimal and dual model representations in kernel-based learninghttp://projecteuclid.org/euclid.ssu/1282746475<strong>Johan A.K. Suykens</strong>, <strong>Carlos Alzate</strong>, <strong>Kristiaan Pelckmans</strong><p><strong>Source: </strong>Statist. Surv., Volume 4, 148--183.</p><p><strong>Abstract:</strong><br/>
This paper discusses the role of primal and (Lagrange) dual model representations in problems of supervised and unsupervised learning. The specification of the estimation problem is conceived at the primal level as a constrained optimization problem. The constraints relate to the model which is expressed in terms of the feature map. From the conditions for optimality one jointly finds the optimal model representation and the model estimate. At the dual level the model is expressed in terms of a positive definite kernel function, which is characteristic for a support vector machine methodology. It is discussed how least squares support vector machines are playing a central role as core models across problems of regression, classification, principal component analysis, spectral clustering, canonical correlation analysis, dimensionality reduction and data visualization.
</p>projecteuclid.org/euclid.ssu/1282746475_Wed, 25 Aug 2010 10:28 EDTWed, 25 Aug 2010 10:28 EDTIdentifying the consequences of dynamic treatment strategies: A decision-theoretic overviewhttp://projecteuclid.org/euclid.ssu/1289579930<strong>A. Philip Dawid</strong>, <strong>Vanessa Didelez</strong><p><strong>Source: </strong>Statist. Surv., Volume 4, 184--231.</p><p><strong>Abstract:</strong><br/>
We consider the problem of learning about and comparing the consequences of dynamic treatment strategies on the basis of observational data. We formulate this within a probabilistic decision-theoretic framework. Our approach is compared with related work by Robins and others: in particular, we show how Robins’s ‘ G -computation’ algorithm arises naturally from this decision-theoretic perspective. Careful attention is paid to the mathematical and substantive conditions required to justify the use of this formula. These conditions revolve around a property we term stability , which relates the probabilistic behaviours of observational and interventional regimes. We show how an assumption of ‘sequential randomization’ (or ‘no unmeasured confounders’), or an alternative assumption of ‘sequential irrelevance’, can be used to infer stability. Probabilistic influence diagrams are used to simplify manipulations, and their power and limitations are discussed. We compare our approach with alternative formulations based on causal DAGs or potential response models. We aim to show that formulating the problem of assessing dynamic treatment strategies as a problem of decision analysis brings clarity, simplicity and generality.
</p><p><strong>References:</strong><br/>Arjas, E. and Parner, J. (2004). Causal reasoning from longitudinal data. <i>Scandinavian Journal of Statistics</i> <b>31</b> 171–187.<br/><br/>Arjas, E. and Saarela, O. (2010). Optimal dynamic regimes: Presenting a case for predictive inference. <i>The International Journal of Biostatistics</i> <b>6</b>. <a href="http://tinyurl.com/33dfssf">http://tinyurl.com/33dfssf</a><br/><br/>Cowell, R. G., Dawid, A. P., Lauritzen, S. L. and Spiegelhalter, D. J. (1999). <i>Probabilistic Networks and Expert Systems</i>. Springer, New York.<br/><br/>Dawid, A. P. (1979). Conditional independence in statistical theory (with Discussion). <i>Journal of the Royal Statistical Society, Series B</i> <b>41</b> 1–31.<br/><br/>Dawid, A. P. (1992). Applications of a general propagation algorithm for probabilistic expert systems. <i>Statistics and Computing</i> <b>2</b> 25–36.<br/><br/>Dawid, A. P. (1998). Conditional independence. In <i>Encyclopedia of Statistical Science ({U}pdate Volume 2)</i> ( S. Kotz, C. B. Read and D. L. Banks, eds.) 146–155. Wiley-Interscience, New York.<br/><br/>Dawid, A. P. (2000). Causal inference without counterfactuals (with Discussion). <i>Journal of the American Statistical Association</i> <b>95</b> 407–448.<br/><br/>Dawid, A. P. (2001). Separoids: A mathematical framework for conditional independence and irrelevance. <i>Annals of Mathematics and Artificial Intelligence</i> <b>32</b> 335–372.<br/><br/>Dawid, A. P. (2002). Influence diagrams for causal modelling and inference. <i>International Statistical Review</i> <b>70</b> 161–189. Corrigenda, <i>ibid </i>., 437.<br/><br/>Dawid, A. P. (2003). Causal inference using influence diagrams: The problem of partial compliance (with Discussion). In <i>Highly Structured Stochastic Systems</i> ( P. J. Green, N. L. Hjort and S. Richardson, eds.) 45–81. Oxford University Press.<br/><br/>Dawid, A. P. (2010). Beware of the DAG! In <i>Proceedings of the NIPS 2008 Workshop on Causality. Journal of Machine Learning Research Workshop and Conference Proceedings</i> ( D. Janzing, I. Guyon and B. Schölkopf, eds.) <b>6</b> 59–86. <a href="http://tinyurl.com/33va7tm">http://tinyurl.com/33va7tm</a><br/><br/>Dawid, A. P. and Didelez, V. (2008). Identifying optimal sequential decisions. In <i>Proceedings of the Twenty-Fourth Annual Conference on Uncertainty in Artificial Intelligence</i> (UAI-08) ( D. McAllester and A. Nicholson, eds.). 113-120. AUAI Press, Corvallis, Oregon. <a href="http://tinyurl.com/3899qpp">http://tinyurl.com/3899qpp</a><br/><br/>Dechter, R. (2003). <i>Constraint Processing</i>. Morgan Kaufmann Publishers.<br/><br/>Didelez, V., Dawid, A. P. and Geneletti, S. G. (2006). Direct and indirect effects of sequential treatments. In <i>Proceedings of the Twenty-Second Annual Conference on Uncertainty in Artificial Intelligence (UAI-06)</i> ( R. Dechter and T. Richardson, eds.). 138-146. AUAI Press, Arlington, Virginia. <a href="http://tinyurl.com/32w3f4e">http://tinyurl.com/32w3f4e</a><br/><br/>Didelez, V., Kreiner, S. and Keiding, N. (2010). Graphical models for inference under outcome dependent sampling. <i>Statistical Science</i> (to appear).<br/><br/>Didelez, V. and Sheehan, N. S. (2007). Mendelian randomisation: Why epidemiology needs a formal language for causality. In <i>Causality and Probability in the Sciences</i>, ( F. Russo and J. Williamson, eds.). <i>Texts in Philosophy Series</i> <b>5</b> 263–292. College Publications, London.<br/><br/>Eichler, M. and Didelez, V. (2010). Granger-causality and the effect of interventions in time series. <i>Lifetime Data Analysis</i> <b>16</b> 3–32.<br/><br/>Ferguson, T. S. (1967). <i>Mathematical Statistics: A Decision Theoretic Approach</i>. Academic Press, New York, London.<br/><br/>Geneletti, S. G. (2007). Identifying direct and indirect effects in a non–counterfactual framework. <i>Journal of the Royal Statistical Society: Series B</i> <b>69</b> 199–215.<br/><br/>Geneletti, S. G. and Dawid, A. P. (2010). Defining and identifying the effect of treatment on the treated. In <i>Causality in the Sciences</i> ( P. M. Illari, F. Russo and J. Williamson, eds.) Oxford University Press (to appear).<br/><br/>Gill, R. D. and Robins, J. M. (2001). Causal inference for complex longitudinal data: The continuous case. <i>Annals of Statistics</i> <b>29</b> 1785–1811.<br/><br/>Guo, H. and Dawid, A. P. (2010). Sufficient covariates and linear propensity analysis. In <i>Proceedings of the Thirteenth International Workshop on Artificial Intelligence and Statistics, (AISTATS) 2010, Chia Laguna, Sardinia, Italy, May 13-15, 2010. Journal of Machine Learning Research Workshop and Conference Proceedings</i> ( Y. W. Teh and D. M. Titterington, eds.) <b>9</b> 281–288. <a href="http://tinyurl.com/33lmuj7">http://tinyurl.com/33lmuj7</a><br/><br/>Henderson, R., Ansel, P. and Alshibani, D. (2010). Regret-regression for optimal dynamic treatment regimes. <i>Biometrics</i> (to appear). doi:10.1111/j.1541-0420.2009.01368.x<br/><br/>Hernán, M. A. and Taubman, S. L. (2008). Does obesity shorten life? The importance of well defined interventions to answer causal questions. <i>International Journal of Obesity</i> <b>32</b> S8–S14.<br/><br/>Holland, P. W. (1986). Statistics and causal inference (with Discussion). <i>Journal of the American Statistical Association</i> <b>81</b> 945–970.<br/><br/>Huang, Y. and Valtorta, M. (2006). Identifiability in causal Bayesian networks: A sound and complete algorithm. In <i>AAAI’06: Proceedings of the 21st National Conference on Artificial Intelligence</i> 1149–1154. AAAI Press.<br/><br/>Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. <i>Statistical Science</i> <b>22</b> 523–539.<br/><br/>Lauritzen, S. L., Dawid, A. P., Larsen, B. N. and Leimer, H. G. (1990). Independence properties of directed Markov fields. <i>Networks</i> <b>20</b> 491–505.<br/><br/>Lok, J., Gill, R., van der Vaart, A. and Robins, J. (2004). Estimating the causal effect of a time-varying treatment on time-to-event using structural nested failure time models. <i>Statistica Neerlandica</i> <b>58</b> 271–295.<br/><br/>Moodie, E. M., Richardson, T. S. and Stephens, D. A. (2007). Demystifying optimal dynamic treatment regimes. <i>Biometrics</i> <b>63</b> 447–455.<br/><br/>Murphy, S. A. (2003). Optimal dynamic treatment regimes (with Discussion). <i>Journal of the Royal Statistical Society, Series B</i> <b>65</b> 331-366.<br/><br/>Oliver, R. M. and Smith, J. Q., eds. (1990). <i>Influence Diagrams, Belief Nets and Decision Analysis</i>. John Wiley and Sons, Chichester, United Kingdom.<br/><br/>Pearl, J. (1995). Causal diagrams for empirical research (with Discussion). <i>Biometrika</i> <b>82</b> 669-710.<br/><br/>Pearl, J. (2009). <i>Causality: Models, Reasoning and Inference</i>, Second ed. Cambridge University Press, Cambridge.<br/><br/>Pearl, J. and Paz, A. (1987). Graphoids: A graph-based logic for reasoning about relevance relations. In <i>Advances in Artificial Intelligence</i> ( D. Hogg and L. Steels, eds.) <b>II</b> 357–363. North-Holland, Amsterdam.<br/><br/>Pearl, J. and Robins, J. (1995). Probabilistic evaluation of sequential plans from causal models with hidden variables. In <i>Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence</i> ( P. Besnard and S. Hanks, eds.) 444–453. Morgan Kaufmann Publishers, San Francisco.<br/><br/>Raiffa, H. (1968). <i>Decision Analysis</i>. Addison-Wesley, Reading, Massachusetts.<br/><br/>Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained exposure periods—Application to control of the healthy worker survivor effect. <i>Mathematical Modelling</i> <b>7</b> 1393–1512.<br/><br/>Robins, J. M. (1987). Addendum to “A new approach to causal inference in mortality studies with sustained exposure periods—Application to control of the healthy worker survivor effect”. <i>Computers & Mathematics with Applications</i> <b>14</b> 923–945.<br/><br/>Robins, J. M. (1989). The analysis of randomized and nonrandomized AIDS treatment trials using a new approach to causal inference in longitudinal studies. In <i>Health Service Research Methodology: A Focus on AIDS</i> ( L. Sechrest, H. Freeman and A. Mulley, eds.) 113–159. NCSHR, U.S. Public Health Service.<br/><br/>Robins, J. M. (1992). Estimation of the time-dependent accelerated failure time model in the presence of confounding factors. <i>Biometrika</i> <b>79</b> 321–324.<br/><br/>Robins, J. M. (1997). Causal inference from complex longitudinal data. In <i>Latent Variable Modeling and Applications to Causality</i>, ( M. Berkane, ed.). <i>Lecture Notes in Statistics</i> <b>120</b> 69–117. Springer-Verlag, New York.<br/><br/>Robins, J. M. (1998). Structural nested failure time models. In <i>Survival Analysis</i>, ( P. K. Andersen and N. Keiding, eds.). <i>Encyclopedia of Biostatistics</i> <b>6</b> 4372–4389. John Wiley and Sons, Chichester, UK.<br/><br/>Robins, J. M. (2000). Robust estimation in sequentially ignorable missing data and causal inference models. In <i>Proceedings of the American Statistical Association Section on Bayesian Statistical Science 1999</i> 6–10.<br/><br/>Robins, J. M. (2004). Optimal structural nested models for optimal sequential decisions. In <i>Proceedings of the Second Seattle Symposium on Biostatistics</i> ( D. Y. Lin and P. Heagerty, eds.) 189–326. Springer, New York.<br/><br/>Robins, J. M., Greenland, S. and Hu, F. C. (1999). Estimation of the causal effect of a time-varying exposure on the marginal mean of a repeated binary outcome. <i>Journal of the American Statistical Association</i> <b>94</b> 687–700.<br/><br/>Robins, J. M., Hernán, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. <i>Epidemiology</i> <b>11</b> 550–560.<br/><br/>Robins, J. M. and Wasserman, L. A. (1997). Estimation of effects of sequential treatments by reparameterizing directed acyclic graphs. In <i>Proceedings of the 13th Annual Conference on Uncertainty in Artificial Intelligence</i> ( D. Geiger and P. Shenoy, eds.) 409-420. Morgan Kaufmann Publishers, San Francisco. <a href="http://tinyurl.com/33ghsas">http://tinyurl.com/33ghsas</a><br/><br/>Rosthøj, S., Fullwood, C., Henderson, R. and Stewart, S. (2006). Estimation of optimal dynamic anticoagulation regimes from observational data: A regret-based approach. <i>Statistics in Medicine</i> <b>25</b> 4197–4215.<br/><br/>Shpitser, I. and Pearl, J. (2006a). Identification of conditional interventional distributions. In <i>Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06)</i> ( R. Dechter and T. Richardson, eds.). 437–444. AUAI Press, Corvallis, Oregon. <a href="http://tinyurl.com/2um8w47">http://tinyurl.com/2um8w47</a><br/><br/>Shpitser, I. and Pearl, J. (2006b). Identification of joint interventional distributions in recursive semi-Markovian causal models. In <i>Proceedings of the Twenty-First National Conference on Artificial Intelligence</i> 1219–1226. AAAI Press, Menlo Park, California.<br/><br/>Spirtes, P., Glymour, C. and Scheines, R. (2000). <i>Causation, Prediction and Search</i>, Second ed. Springer-Verlag, New York.<br/><br/>Sterne, J. A. C., May, M., Costagliola, D., de Wolf, F., Phillips, A. N., Harris, R., Funk, M. J., Geskus, R. B., Gill, J., Dabis, F., Miro, J. M., Justice, A. C., Ledergerber, B., Fatkenheuer, G., Hogg, R. S., D’Arminio-Monforte, A., Saag, M., Smith, C., Staszewski, S., Egger, M., Cole, S. R. and When To Start Consortium (2009). Timing of initiation of antiretroviral therapy in AIDS-Free HIV-1-infected patients: A collaborative analysis of 18 HIV cohort studies. <i>Lancet</i> <b>373</b> 1352–1363.<br/><br/>Taubman, S. L., Robins, J. M., Mittleman, M. A. and Hernán, M. A. (2009). Intervening on risk factors for coronary heart disease: An application of the parametric <i>g</i>-formula. <i>International Journal of Epidemiology</i> <b>38</b> 1599–1611.<br/><br/>Tian, J. (2008). Identifying dynamic sequential plans. In <i>Proceedings of the Twenty-Fourth Annual Conference on Uncertainty in Artificial Intelligence</i> (UAI-08) ( D. McAllester and A. Nicholson, eds.). 554–561. AUAI Press, Corvallis, Oregon. <a href="http://tinyurl.com/36ufx2h">http://tinyurl.com/36ufx2h</a><br/><br/>Verma, T. and Pearl, J. (1990). Causal networks: Semantics and expressiveness. In <i>Uncertainty in Artificial Intelligence 4</i> ( R. D. Shachter, T. S. Levitt, L. N. Kanal and J. F. Lemmer, eds.) 69–76. North-Holland, Amsterdam.<br/><br/></p>projecteuclid.org/euclid.ssu/1289579930_Fri, 12 Nov 2010 11:39 ESTFri, 12 Nov 2010 11:39 ESTThe ARMA alphabet soup: A tour of ARMA model variantshttp://projecteuclid.org/euclid.ssu/1291731822<strong>Scott H. Holan</strong>, <strong>Robert Lund</strong>, <strong>Ginger Davis</strong><p><strong>Source: </strong>Statist. Surv., Volume 4, 232--274.</p><p><strong>Abstract:</strong><br/>
Autoregressive moving-average (ARMA) difference equations are ubiquitous models for short memory time series and have parsimoniously described many stationary series. Variants of ARMA models have been proposed to describe more exotic series features such as long memory autocovariances, periodic autocovariances, and count support set structures. This review paper enumerates, compares, and contrasts the common variants of ARMA models in today’s literature. After the basic properties of ARMA models are reviewed, we tour ARMA variants that describe seasonal features, long memory behavior, multivariate series, changing variances (stochastic volatility) and integer counts. A list of ARMA variant acronyms is provided.
</p><p><strong>References:</strong><br/>Aknouche, A. and Guerbyenne, H. (2006). Recursive estimation of GARCH models. <i>Communications in Statistics-Simulation and Computation</i> <b>35</b> 925–938.<br/><br/>Alzaid, A. A. and Al-Osh, M. (1990). An integer-valued <i>p</i>th-order autoregressive structure (INAR (<i>p</i>)) process. <i>Journal of Applied Probability</i> <b>27</b> 314–324.<br/><br/>Anderson, P. L., Tesfaye, Y. G. and Meerschaert, M. M. (2007). Fourier-PARMA models and their application to river flows. <i>Journal of Hydrologic Engineering</i> <b>12</b> 462–472.<br/><br/>Ansley, C. F. (1979). An algorithm for the exact likelihood of a mixed autoregressive-moving average process. <i>Biometrika</i> <b>66</b> 59–65.<br/><br/>Basawa, I. V. and Lund, R. (2001). Large sample properties of parameter estimates for periodic ARMA models. <i>Journal of Time Series Analysis</i> <b>22</b> 651–663.<br/><br/>Bauwens, L., Laurent, S. and Rombouts, J. V. K. (2006). Multivariate GARCH models: A survey. <i>Journal of Applied Econometrics</i> <b>21</b> 79–109.<br/><br/>Bertelli, S. and Caporin, M. (2002). A note on calculating autocovariances of long-memory processes. <i>Journal of Time Series Analysis</i> <b>23</b> 503–508.<br/><br/>Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. <i>Journal of Econometrics</i> <b>31</b> 307–327.<br/><br/>Bollerslev, T. (2008). Glossary to ARCH (GARCH). <i>CREATES Research Paper 2008-49</i>.<br/><br/>Bollerslev, T., Engle, R. F. and Wooldridge, J. M. (1988). A capital asset pricing model with time-varying covariances. <i>The Journal of Political Economy</i> <b>96</b> 116–131.<br/><br/>Bondon, P. and Palma, W. (2007). A class of antipersistent processes. <i>Journal of Time Series Analysis</i> <b>28</b> 261–273.<br/><br/>Bougerol, P. and Picard, N. (1992). Strict stationarity of generalized autoregressive processes. <i>The Annals of Probability</i> <b>20</b> 1714–1730.<br/><br/>Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (2008). <i>Time Series Analysis: Forecasting and Control</i>, 4th ed. Wiley, New Jersey.<br/><br/>Breidt, F. J., Davis, R. A. and Trindade, A. A. (2001). Least absolute deviation estimation for all-pass time series models. <i>Annals of Statistics</i> <b>29</b> 919–946.<br/><br/>Brockwell, P. J. (1994). On continuous-time threshold ARMA processes. <i>Journal of Statistical Planning and Inference</i> <b>39</b> 291–303.<br/><br/>Brockwell, P. J. (2001). Continuous-time ARMA processes. In <i>Stochastic Processes: Theory and Methods</i>, ( D. N. Shanbhag and C. R. Rao, eds.). <i>Handbook of Statistics</i> <b>19</b> 249–276. Elsevier.<br/><br/>Brockwell, P. J. and Davis, R. A. (1991). <i>Time Series: Theory and Methods</i>, 2nd ed. Springer, New York.<br/><br/>Brockwell, P. J. and Davis, R. A. (2002). <i>Introduction to Time Series and Forecasting</i>, 2nd ed. Springer, New York.<br/><br/>Brockwell, P. J. and Marquardt, T. (2005). Lèvy-driven and fractionally integrated ARMA processes with continuous-time paramaters. <i>Statistica Sinica</i> <b>15</b> 477–494.<br/><br/>Chan, K. S. (1990). Testing for threshold autoregression. <i>Annals of Statistics</i> <b>18</b> 1886–1894.<br/><br/>Chan, N. H. (2002). <i>Time Series: Applications to Finance</i>. John Wiley & Sons, New York.<br/><br/>Chan, N. H. and Palma, W. (1998). State space modeling of long-memory processes. <i>Annals of Statistics</i> <b>26</b> 719–740.<br/><br/>Chan, N. H. and Palma, W. (2006). Estimation of long-memory time series models: A survey of different likelihood-based methods. <i>Advances in Econometrics</i> <b>20</b> 89–121.<br/><br/>Chatfield, C. (2003). <i>The Analysis of Time Series: An Introduction</i>, 6th ed. Chapman & Hall/CRC, Boca Raton.<br/><br/>Chen, W., Hurvich, C. M. and Lu, Y. (2006). On the correlation matrix of the discrete Fourier transform and the fast solution of large Toeplitz systems for long-memory time series. <i>Journal of the American Statistical Association</i> <b>101</b> 812–822.<br/><br/>Chernick, M. R., Hsing, T. and McCormick, W. P. (1991). Calculating the extremal index for a class of stationary sequences. <i>Advances in Applied Probability</i> <b>23</b> 835–850.<br/><br/>Chib, S., Nardari, F. and Shephard, N. (2006). Analysis of high dimensional multivariate stochastic volatility models. <i>Journal of Econometrics</i> <b>134</b> 341–371.<br/><br/>Cryer, J. D. and Chan, K. S. (2008). <i>Time Series Analysis: With Applications in R</i>. Springer, New York.<br/><br/>Cui, Y. and Lund, R. (2009). A new look at time series of counts. <i>Biometrika</i> <b>96</b> 781–792.<br/><br/>Davis, R. A., Dunsmuir, W. T. M. and Wang, Y. (1999). Modeling time series of count data. In <i>Asymptotics, Nonparametrics and Time Series</i>, ( S. Ghosh, ed.). <i>Statistics Textbooks Monograph</i> 63–113. Marcel Dekker, New York.<br/><br/>Davis, R. A., Dunsmuir, W. and Streett, S. B. (2003). Observation-driven models for Poisson counts. <i>Biometrika</i> <b>90</b> 777–790.<br/><br/>Davis, R. A. and Resnick, S. I. (1996). Limit theory for bilinear processes with heavy-tailed noise. <i>The Annals of Applied Probability</i> <b>6</b> 1191–1210.<br/><br/>Deistler, M. and Hannan, E. J. (1981). Some properties of the parameterization of ARMA systems with unknown order. <i>Journal of Multivariate Analysis</i> <b>11</b> 474–484.<br/><br/>Dufour, J. M. and Jouini, T. (2005). Asymptotic distribution of a simple linear estimator for VARMA models in echelon form. <i>Statistical Modeling and Analysis for Complex Data Problems</i> 209–240.<br/><br/>Dunsmuir, W. and Hannan, E. J. (1976). Vector linear time series models. <i>Advances in Applied Probability</i> <b>8</b> 339–364.<br/><br/>Durbin, J. and Koopman, S. J. (2001). <i>Time Series Analysis by State Space Methods</i>. Oxford University Press, Oxford.<br/><br/>Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. <i>Econometrica</i> <b>50</b> 987–1007.<br/><br/>Engle, R. F. (2002). Dynamic conditional correlation. <i>Journal of Business and Economic Statistics</i> <b>20</b> 339–350.<br/><br/>Engle, R. F. and Bollerslev, T. (1986). Modelling the persistence of conditional variances. <i>Econometric Reviews</i> <b>5</b> 1–50.<br/><br/>Fuller, W. A. (1996). <i>Introduction to Statistical Time Series</i>, 2nd ed. John Wiley & Sons, New York.<br/><br/>Geweke, J. and Porter-Hudak, S. (1983). The estimation and application of long memory time series models. <i>Journal of Time Series Analysis</i> <b>4</b> 221–238.<br/><br/>Gladyšhev, E. G. (1961). Periodically correlated random sequences. <i>Soviet Math</i> <b>2</b> 385–388.<br/><br/>Granger, C. W. J. (1982). Acronyms in time series analysis (ATSA). <i>Journal of Time Series Analysis</i> <b>3</b> 103–107.<br/><br/>Granger, C. W. J. and Andersen, A. P. (1978). <i>An Introduction to Bilinear Time Series Models</i>. Vandenhoeck and Ruprecht Göttingen.<br/><br/>Granger, C. W. J. and Joyeux, R. (1980). An introduction to long-memory time series models and fractional differencing. <i>Journal of Time Series Analysis</i> <b>1</b> 15–29.<br/><br/>Gray, H. L., Zhang, N. F. and Woodward, W. A. (1989). On generalized fractional processes. <i>Journal of Time Series Analysis</i> <b>10</b> 233–257.<br/><br/>Hamilton, J. D. (1994). <i>Time Series Analysis</i>. Princeton University Press, Princeton, New Jersey.<br/><br/>Hannan, E. J. (1955). A test for singularities in Sydney rainfall. <i>Australian Journal of Physics</i> <b>8</b> 289–297.<br/><br/>Hannan, E. J. (1969). The identification of vector mixed autoregressive-moving average system. <i>Biometrika</i> <b>56</b> 223–225.<br/><br/>Hannan, E. J. (1970). <i>Multiple Time Series</i>. John Wiley & Sons, New York.<br/><br/>Hannan, E. J. (1976). The identification and parameterization of ARMAX and state space forms. <i>Econometrica</i> <b>44</b> 713–723.<br/><br/>Hannan, E. J. (1979). The Statistical Theory of Linear Systems. In <i>Developments in Statistics</i> ( P. R. Krishnaiah, ed.) 83–121. Academic Press, New York.<br/><br/>Hannan, E. J. and Deistler, M. (1987). <i>The Statistical Theory of Linear Systems</i>. John Wiley & Sons, New York.<br/><br/>Harvey, A. C. (1989). <i>Forecasting, Structural Time Series Models and the Kalman Filter</i>. Cambridge University Press, Cambridge.<br/><br/>Haslett, J. and Raftery, A. E. (1989). Space-time modelling with long-memory dependence: Assessing Ireland’s wind power resource. <i>Applied Statistics</i> <b>38</b> 1–50.<br/><br/>Hosking, J. R. M. (1981). Fractional differencing. <i>Biometrika</i> <b>68</b> 165–176.<br/><br/>Hui, Y. V. and Li, W. K. (1995). On fractionally differenced periodic processes. <i>Sankhyā: The Indian Journal of Statistics, Series B</i> <b>57</b> 19–31.<br/><br/>Jacobs, P. A. and Lewis, P. A. W. (1978a). Discrete time series generated by mixtures. I: Correlational and runs properties. <i>Journal of the Royal Statistical Society. Series B (Methodological)</i> <b>40</b> 94–105.<br/><br/>Jacobs, P. A. and Lewis, P. A. W. (1978b). Discrete time series generated by mixtures II: Asymptotic properties. <i>Journal of the Royal Statistical Society. Series B (Methodological)</i> <b>40</b> 222–228.<br/><br/>Jacobs, P. A. and Lewis, P. A. W. (1983). Stationary discrete autoregressive-moving average time series generated by mixtures. <i>Journal of Time Series Analysis</i> <b>4</b> 19–36.<br/><br/>Jones, R. H. (1980). Maximum likelihood fitting of ARMA models to time series with missing observations. <i>Technometrics</i> <b>22</b> 389–395.<br/><br/>Jones, R. H. and Brelsford, W. M. (1967). Time series with periodic structure. <i>Biometrika</i> <b>54</b> 403–408.<br/><br/>Kedem, B. and Fokianos, K. (2002). <i>Regression Models for Time Series Analysis</i>. John Wiley & Sons, New Jersey.<br/><br/>Ko, K. and Vannucci, M. (2006). Bayesian wavelet-based methods for the detection of multiple changes of the long memory parameter. <i>IEEE Transactions on Signal Processing</i> <b>54</b> 4461–4470.<br/><br/>Kohn, R. (1979). Asymptotic estimation and hypothesis testing results for vector linear time series models. <i>Econometrica</i> <b>47</b> 1005–1030.<br/><br/>Kokoszka, P. S. and Taqqu, M. S. (1995). Fractional ARIMA with stable innovations. <i>Stochastic Processes and their Applications</i> <b>60</b> 19–47.<br/><br/>Kokoszka, P. S. and Taqqu, M. S. (1996). Parameter estimation for infinite variance fractional ARIMA. <i>Annals of Statistics</i> <b>24</b> 1880–1913.<br/><br/>Lawrance, A. J. and Lewis, P. A. W. (1980). The exponential autoregressive-moving average EARMA(<i>p</i>,<i>q</i>) process. <i>Journal of the Royal Statistical Society. Series B (Methodological)</i> <b>42</b> 150–161.<br/><br/>Ling, S. and Li, W. K. (1997). On fractionally integrated autoregressive moving-average time series models with conditional heteroscedasticity. <i>Journal of the American Statistical Association</i> <b>92</b> 1184–1194.<br/><br/>Liu, J. and Brockwell, P. J. (1988). On the general bilinear time series model. <i>Journal of Applied Probability</i> <b>25</b> 553–564.<br/><br/>Lund, R. and Basawa, I. V. (2000). Recursive prediction and likelihood evaluation for periodic ARMA models. <i>Journal of Time Series Analysis</i> <b>21</b> 75–93.<br/><br/>Lund, R., Shao, Q. and Basawa, I. (2006). Parsimonious periodic time series modeling. <i>Australian & New Zealand Journal of Statistics</i> <b>48</b> 33–47.<br/><br/>Lütkepohl, H. (1991). <i>Introduction to Multiple Time Series Analysis</i>. Springer-Verlag, New York.<br/><br/>Lütkepohl, H. (2005). <i>New Introduction to Multiple Time Series Analysis</i>. Springer, New York.<br/><br/>MacDonald, I. L. and Zucchini, W. (1997). <i>Hidden Markov and Other Models for Discrete-Valued Time Series</i>. Chapman & Hall/CRC, Boca Raton.<br/><br/>Mann, H. B. and Wald, A. (1943). On the statistical treatment of linear stochastic difference equations. <i>Econometrica</i> <b>11</b> 173–220.<br/><br/>Marriott, J., Ravishanker, N., Gelfand, A. and Pai, J. (1996). Bayesian analysis of ARMA processes: Complete sampling-based inference under exact likelihoods. In <i>Bayesian Analysis in Statistics and Econometrics: Essays in Honor of Arnold Zellner</i> ( D. Berry, K. Challoner and J. Geweke, eds.) 243–256. Wiley, New York.<br/><br/>McKenzie, E. (1988). Some ARMA models for dependent sequences of Poisson counts. <i>Advances in Applied Probability</i> <b>20</b> 822–835.<br/><br/>Mikosch, T. and Starica, C. (2004). Nonstationarities in financial time series, the long-range dependence, and the IGARCH effects. <i>Review of Economics and Statistics</i> <b>86</b> 378–390.<br/><br/>Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. <i>Econometrica</i> <b>59</b> 347–370.<br/><br/>Nelson, D. B. and Cao, C. Q. (1992). Inequality constraints in the univariate GARCH model. <i>Journal of Business and Economic Statistics</i> <b>10</b> 229–235.<br/><br/>Ooms, M. and Franses, P. H. (2001). A seasonal periodic long memory model for monthly river flows. <i>Environmental Modelling & Software</i> <b>16</b> 559–569.<br/><br/>Pagano, M. (1978). On periodic and multiple autoregressions. <i>Annals of Statistics</i> <b>6</b> 1310–1317.<br/><br/>Pai, J. S. and Ravishanker, N. (1998). Bayesian analysis of autoregressive fractionally integrated moving-average processes. <i>Journal of Time Series Analysis</i> <b>19</b> 99–112.<br/><br/>Palma, W. (2007). <i>Long-Memory Time Series: Theory and Methods</i>. John Wiley & Sons, New Jersey.<br/><br/>Palma, W. and Chan, N. H. (2005). Efficient estimation of seasonal long-range-dependent processes. <i>Journal of Time Series Analysis</i> <b>26</b> 863–892.<br/><br/>Pfeifer, P. E. and Deutsch, S. J. (1980). A three-stage iterative procedure for space-time modeling. <i>Technometrics</i> <b>22</b> 35–47.<br/><br/>Prado, R. and West, M. (2010). <i>Time Series Modeling, Computation and Inference</i>. Chapman & Hall/CRC, Boca Raton.<br/><br/>Quoreshi, A. M. M. S. (2008). A long memory count data time series model for financial application. <i>Preprint</i>.<br/><br/>R Development Core Team, (2010). R: A Language and Environment for Statistical Computing. <a href="http://www.R-project.org">http://www.R-project.org</a>.<br/><br/>Ravishanker, N. and Ray, B. K. (1997). Bayesian analysis of vector ARMA models using Gibbs sampling. <i>Journal of Forecasting</i> <b>16</b> 177–194.<br/><br/>Ravishanker, N. and Ray, B. K. (2002). Bayesian prediction for vector ARFIMA processes. <i>International Journal of Forecasting</i> <b>18</b> 207–214.<br/><br/>Reinsel, G. C. (1997). <i>Elements of Multivariate Time Series Analysis</i>. Springer, New York.<br/><br/>Resnick, S. I. and Willekens, E. (1991). Moving averages with random coefficients and random coefficient autoregressive models. <i>Communications in Statistics. Stochastic Models</i> <b>7</b> 511–525.<br/><br/>Rootzén, H. (1986). Extreme value theory for moving average processes. <i>The Annals of Probability</i> <b>14</b> 612–652.<br/><br/>Scotto, M. G. (2007). Extremes for solutions to stochastic difference equations with regularly varying tails. <i>REVSTAT–Statistical Journal</i> <b>5</b> 229–247.<br/><br/>Shao, Q. and Lund, R. (2004). Computation and characterization of autocorrelations and partial autocorrelations in periodic ARMA models. <i>Journal of Time Series Analysis</i> <b>25</b> 359–372.<br/><br/>Shumway, R. H. and Stoffer, D. S. (2006). <i>Time Series Analysis and its Applications: With R Examples</i>, 2nd ed. Springer, New York.<br/><br/>Silvennoinen, A. and Teräsvirta, T. (2009). Multivariate GARCH models. In <i>Handbook of Financial Time Series</i> ( T. Andersen, R. Davis, J. Kreib, and T. Mikosch, eds.) Springer, New York.<br/><br/>Sowell, F. (1992). Maximum likelihood estimation of stationary univariate fractionally integrated time series models. <i>Journal of Econometrics</i> <b>53</b> 165–188.<br/><br/>Startz, R. (2008). Binomial autoregressive moving average models with an application to U.S. recessions. <i>Journal of Business and Economic Statistics</i> <b>26</b> 1–8.<br/><br/>Stramer, O., Tweedie, R. L. and Brockwell, P. J. (1996). Existence and stability of continuous time threshold ARMA processes. <i>Statistica Sinica</i> <b>6</b> 715–732.<br/><br/>Subba Rao, T. (1981). On the theory of bilinear time series models. <i>Journal of the Royal Statistical Society. Series B (Methodological)</i> <b>43</b> 244–255.<br/><br/>Tong, H. and Lim, K. S. (1980). Threshold autoregression, limit cycles and cyclical data. <i>Journal of the Royal Statistical Society. Series B (Methodological)</i> <b>42</b> 245–292.<br/><br/>Troutman, B. M. (1979). Some results in periodic autoregression. <i>Biometrika</i> <b>66</b> 219–228.<br/><br/>Tsai, H. (2009). On continuous-time autoregressive fractionally integrated moving average processes. <i>Bernoulli</i> <b>15</b> 178–194.<br/><br/>Tsai, H. and Chan, K. S. (2000). A note on the covariance structure of a continuous-time ARMA process. <i>Statistica Sinica</i> <b>10</b> 989–998.<br/><br/>Tsai, H. and Chan, K. S. (2005). Maximum likelihood estimation of linear continuous time long memory processes with discrete time data. <i>Journal of the Royal Statistical Society. Series B (Statistical Methodology)</i> <b>67</b> 703–716.<br/><br/>Tsai, H. and Chan, K. S. (2008). A note on inequality constraints in the GARCH model. <i>Econometric Theory</i> <b>24</b> 823–828.<br/><br/>Tsay, R. S. (1989). Parsimonious parameterization of vector autoregressive moving average models. <i>Journal of Business and Economic Statistics</i> <b>7</b> 327–341.<br/><br/>Tunnicliffe-Wilson, G. (1979). Some efficient computational procedures for high order ARMA models. <i>Journal of Statistical Computation and Simulation</i> <b>8</b> 301–309.<br/><br/>Ursu, E. and Duchesne, P. (2009). On modelling and diagnostic checking of vector periodic autoregressive time series models. <i>Journal of Time Series Analysis</i> <b>30</b> 70–96.<br/><br/>Vecchia, A. V. (1985a). Maximum likelihood estimation for periodic autoregressive moving average models. <i>Technometrics</i> <b>27</b> 375–384.<br/><br/>Vecchia, A. V. (1985b). Periodic autoregressive-moving average (PARMA) modeling with applications to water resources. <i>Journal of the American Water Resources Association</i> <b>21</b> 721–730.<br/><br/>Vidakovic, B. (1999). <i>Statistical Modeling by Wavelets</i>. John Wiley & Sons, New York.<br/><br/>West, M. and Harrison, J. (1997). <i>Bayesian Forecasting and Dynamic Models</i>, 2nd ed. Springer, New York.<br/><br/>Wold, H. (1954). <i>A Study in the Analysis of Stationary Time Series</i>. Almquist & Wiksell, Stockholm.<br/><br/>Woodward, W. A., Cheng, Q. C. and Gray, H. L. (1998). A <i>k</i>-factor GARMA long-memory model. <i>Journal of Time Series Analysis</i> <b>19</b> 485–504.<br/><br/>Zivot, E. and Wang, J. (2006). <i>Modeling Financial Time Series with S-PLUS</i>, 2nd ed. Springer, New York.<br/><br/></p>projecteuclid.org/euclid.ssu/1291731822_Tue, 07 Dec 2010 09:23 ESTTue, 07 Dec 2010 09:23 ESTData confidentiality: A review of methods for statistical disclosure limitation and methods for assessing privacyhttp://projecteuclid.org/euclid.ssu/1296828958<strong>Gregory J. Matthews</strong>, <strong>Ofer Harel</strong><p><strong>Source: </strong>Statist. Surv., Volume 5, 1--29.</p><p><strong>Abstract:</strong><br/>
There is an ever increasing demand from researchers for access to useful microdata files. However, there are also growing concerns regarding the privacy of the individuals contained in the microdata. Ideally, microdata could be released in such a way that a balance between usefulness of the data and privacy is struck. This paper presents a review of proposed methods of statistical disclosure control and techniques for assessing the privacy of such methods under different definitions of disclosure.
</p><p><strong>References:</strong><br/>Abowd, J., Woodcock, S., 2001. Disclosure limitation in longitudinal linked data. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, 215–277.<br/><br/>Adam, N.R., Worthmann, J.C., 1989. Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21 (4), 515–556.<br/><br/>Armstrong, M., Rushton, G., Zimmerman, D.L., 1999. Geographically masking health data to preserve confidentiality. Statistics in Medicine 18 (5), 497–525.<br/><br/>Bethlehem, J.G., Keller, W., Pannekoek, J., 1990. Disclosure control of microdata. Jorunal of the American Statistical Association 85, 38–45.<br/><br/>Blum, A., Dwork, C., McSherry, F., Nissam, K., 2005. Practical privacy: The sulq framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. pp. 128–138.<br/><br/>Bowden, R.J., Sim, A.B., 1992. The privacy bootstrap. Journal of Business and Economic Statistics 10 (3), 337–345.<br/><br/>Carlson, M., Salabasis, M., 2002. A data-swapping technique for generating synthetic samples; a method for disclosure control. Res. Official Statist. (5), 35–64.<br/><br/>Cox, L.H., 1980. Suppression methodology and statistical disclosure control. Journal of the American Statistical Association 75, 377–385.<br/><br/>Cox, L.H., 1984. Disclosure control methods for frequency count data. Tech. rep., U.S. Bureau of the Census.<br/><br/>Cox, L.H., 1987. A constructive procedure for unbiased controlled rounding. Journal of the American Statistical Association 82, 520–524.<br/><br/>Cox, L.H., 1994. Matrix masking methods for disclosure limitation in microdata. Survey Methodology 6, 165–169.<br/><br/>Cox, L.H., Fagan, J.T., Greenberg, B., Hemmig, R., 1987. Disclosure avoidance techniques for tabular data. Tech. rep., U.S. Bureau of the Census.<br/><br/>Dalenius, T., 1977. Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, 429–444.<br/><br/>Dalenius, T., 1986. Finding a needle in a haystack - or identifying anonymous census record. Journal of Official Statistics 2 (3), 329–336.<br/><br/>Dalenius, T., Denning, D., 1982. A hybrid scheme for release of statistics. Statistisk Tidskrift.<br/><br/>Dalenius, T., Reiss, S.P., 1982. Data-swapping: A technique for disclosure control. Journal of Statistical Planning and Inference 6, 73–85.<br/><br/>De Waal, A., Hundepool, A., Willenborg, L., 1995. Argus: Software for statistical disclosure control of microdata. U.S. Census Bureau.<br/><br/>DeGroot, M.H., 1962. Uncertainty, information, and sequential experiments. Annals of Mathematical Statistics 33, 404–419.<br/><br/>DeGroot, M.H., 1970. Optimal Statistical Decisions. Mansell, London.<br/><br/>Dinur, I., Nissam, K., 2003. Revealing information while preserving privacy. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principlesof Database Systems. pp. 202–210.<br/><br/>Domingo-Ferrer, J., Torra, V., 2001a. A Quantitative Comparison of Disclosure Control Methods for Microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (Eds.), Confidentiality, Disclosure and Data Access - Theory and Practical Applications for Statistical Agencies. North-Holland, Amsterdam, Ch. 6, pp. 113–135.<br/><br/>Domingo-Ferrer, J., Torra, V., 2001b. Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (Eds.), Confidentiality, Disclosure and Data Access - Theory and Practical Applications for Statistical Agencies. North-Holland, Amsterdam, Ch. 5, pp. 93–112.<br/><br/>Duncan, G., Lambert, D., 1986. Disclosure-limited data dissemination. Journal of the American Statistical Association 81, 10–28.<br/><br/>Duncan, G., Lambert, D., 1989. The risk of disclosure for microdata. Journal of Business & Economic Statistics 7, 207–217. <br/><br/>Duncan, G., Pearson, R., 1991. Enhancing access to microdata while protecting confidentiality: prospects for the future (with discussion). Statistical Science 6, 219–232.<br/><br/>Dwork, C., 2006. Differential privacy. In: ICALP. Springer, pp. 1–12.<br/><br/>Dwork, C., 2008. An ad omnia approach to defining and achieving private data analysis. In: Lecture Notes in Computer Science. Springer, p. 10.<br/><br/>Dwork, C., Lei, J., 2009. Differential privacy and robust statistics. In: Proceedings of the 41th Annual ACM Symposium on Theory of Computing (STOC). pp. 371–380.<br/><br/>Dwork, C., Mcsherry, F., Nissim, K., Smith, A., 2006. Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd Theory of Cryptography Conference. Springer, pp. 265–284.<br/><br/>Dwork, C., Nissam, K., 2004. Privacy-preserving datamining on vertically partitioned databases. In: Advances in Cryptology: Proceedings of Crypto. pp. 528–544.<br/><br/>Elliot, M., 2000. DIS: a new approach to the measurement of statistical disclosure risk. International Journal of Risk Assessment and Management 2, 39–48.<br/><br/>Federal Committee on Statistical Methodology (FCSM), 2005. Statistical policy working group 22 - report on statistical disclosure limitation methodology. U.S. Census Bureau.<br/><br/>Fellegi, I.P., 1972. On the question of statistical confidentiality. Journal of the American Statistical Association 67 (337), 7–18.<br/><br/>Fienberg, S.E., McIntyre, J., 2004. Data swapping: Variations on a theme by Dalenius and Reiss. In: Domingo-Ferrer, J., Torra, V. (Eds.), Privacy in Statistical Databases. Vol. 3050 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, pp. 519, http://dx.doi.org/10.1007/ 978-3-540-25955-8_2<br/><br/>Fuller, W., 1993. Masking procedurse for microdata disclosure limitation. Journal of Official Statistics 9, 383–406.<br/><br/>General Assembly of the United Nations, 1948. Universal declaration of human rights.<br/><br/>Gouweleeuw, J., P. Kooiman, L.W., de Wolf, P.-P., 1998. Post randomisation for statistical disclosure control: Theory and implementation. Journal of Official Statistics 14 (4), 463–478.<br/><br/>Greenberg, B., 1987. Rank swapping for masking ordinal microdata. Tech. rep., U.S. Bureau of the Census (unpublished manuscript), Suitland, Maryland, USA.<br/><br/>Greenberg, B.G., Abul-Ela, A.-L.A., Simmons, W.R., Horvitz, D.G., 1969. The unrelated question randomized response model: Theoretical framework. Journal of the American Statistical Association 64 (326), 520–539.<br/><br/>Harel, O., Zhou, X.-H., 2007. Multiple imputation: Review and theory, implementation and software. Statistics in Medicine 26, 3057–3077. <br/><br/>Hundepool, A., Domingo-ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Nordholt, E.S., Seri, G., paul De Wolf, P., 2006. A CENtre of EXcellence for Statistical Disclosure Control Handbook on Statistical Disclosure Control Version 1.01.<br/><br/>Hundepool, A., Wetering, A. v.d., Ramaswamy, R., Wolf, P.d., Giessing, S., Fischetti, M., Salazar, J., Castro, J., Lowthian, P., Feb. 2005. <i>τ</i>-argus 3.1 user manual. Statistics Netherlands, Voorburg NL.<br/><br/>Hundepool, A., Willenborg, L., 1996. <i>μ</i>- and <i>τ</i>-argus: Software for statistical disclosure control. Third International Seminar on Statistical Confidentiality, Bled.<br/><br/>Karr, A., Kohnen, C.N., Oganian, A., Reiter, J.P., Sanil, A.P., 2006. A framework for evaluating the utility of data altered to protect confidentiality. American Statistician 60 (3), 224–232.<br/><br/>Kaufman, S., Seastrom, M., Roey, S., 2005. Do disclosure controls to protect confidentiality degrade the quality of the data? In: American Statistical Association, Proceedings of the Section on Survey Research.<br/><br/>Kennickell, A.B., 1997. Multiple imputation and disclosure protection: the case of the 1995 survey of consumer finances. Record Linkage Techniques, 248–267.<br/><br/>Kim, J., 1986. Limiting disclosure in microdata based on random noise and transformation. Bureau of the Census.<br/><br/>Krumm, J., 2007. Inference attacks on location tracks. Proceedings of Fifth International Conference on Pervasive Computingy, 127–143.<br/><br/>Li, N., Li, T., Venkatasubramanian, S., 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. pp. 106–115.<br/><br/>Liew, C.K., Choi, U.J., Liew, C.J., 1985. A data distortion by probability distribution. ACM Trans. Database Syst. 10 (3), 395–411.<br/><br/>Little, R.J.A., 1993. Statistical analysis of masked data. Journal of Official Statistics 9, 407–426.<br/><br/>Little, R.J.A., Rubin, D.B., 1987. Statistical Analysis with Missing Data. John Wiley & Sons.<br/><br/>Liu, F., Little, R.J.A., 2002. Selective multiple mputation of keys for statistical disclosure control in microdata. In: Proceedings Joint Statistical Meet. pp. 2133–2138.<br/><br/>Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Vilhuber, L., April 2008. Privacy: Theory meets practice on the map. In: International Conference on Data Engineering. Cornell University Comuputer Science Department, Cornell, USA, p. 10.<br/><br/>Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M., 2007. L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1 (1), 3.<br/><br/>Manning, A.M., Haglin, D.J., Keane, J.A., 2008. A recursive search algorithm for statistical disclosure assessment. Data Min. Knowl. Discov. 16 (2), 165–196. <br/><br/>Marsh, C., Skinner, C., Arber, S., Penhale, B., Openshaw, S., Hobcraft, J., Lievesley, D., Walford, N., 1991. The case for samples of anonymized records from the 1991 census. Journal of the Royal Statistical Society 154 (2), 305–340.<br/><br/>Matthews, G.J., Harel, O., Aseltine, R.H., 2010a. Assessing database privacy using the area under the receiver-operator characteristic curve. Health Services and Outcomes Research Methodology 10 (1), 1–15.<br/><br/>Matthews, G.J., Harel, O., Aseltine, R.H., 2010b. Examining the robustness of fully synthetic data techniques for data with binary variables. Journal of Statistical Computation and Simulation 80 (6), 609–624.<br/><br/>Moore, Jr., R., 1996. Controlled data-swapping techniques for masking public use microdata. Census Tech Report.<br/><br/>Mugge, R., 1983. Issues in protecting confidentiality in national health statistics. Proceedings of the Section on Survey Research Methods.<br/><br/>Nissim, K., Raskhodnikova, S., Smith, A., 2007. Smooth sensitivity and sampling in private data analysis. In: STOC ’07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing. pp. 75–84.<br/><br/>Paass, G., 1988. Disclosure risk and disclosure avoidance for microdata. Journal of Business and Economic Statistics 6 (4), 487–500.<br/><br/>Palley, M., Simonoff, J., 1987. The use of regression methodology for the compromise of confidential information in statistical databases. ACM Trans. Database Systems 12 (4), 593–608.<br/><br/>Raghunathan, T.E., Reiter, J.P., Rubin, D.B., 2003. Multiple imputation for statistical disclosure limitation. Journal of Official Statistics 19 (1), 1–16.<br/><br/>Rajasekaran, S., Harel, O., Zuba, M., Matthews, G.J., Aseltine, Jr., R., 2009. Responsible data releases. In: Proceedings 9th Industrial Conference on Data Mining (ICDM). Springer LNCS, pp. 388–400.<br/><br/>Reiss, S.P., 1984. Practical data-swapping: The first steps. CM Transactions on Database Systems 9, 20–37.<br/><br/>Reiter, J.P., 2002. Satisfying disclosure restriction with synthetic data sets. Journal of Official Statistics 18 (4), 531–543.<br/><br/>Reiter, J.P., 2003. Inference for partially synthetic, public use microdata sets. Survey Methodology 29 (2), 181–188.<br/><br/>Reiter, J.P., 2004a. New approaches to data dissemination: A glimpse into the future (?). Chance 17 (3), 11–15.<br/><br/>Reiter, J.P., 2004b. Simultaneous use of multiple imputation for missing data and disclosure limitation. Survey Methodology 30 (2), 235–242.<br/><br/>Reiter, J.P., 2005a. Estimating risks of identification disclosure in microdata. Journal of the American Statistical Association 100, 1103–1112.<br/><br/>Reiter, J.P., 2005b. Releasing multiply imputed, synthetic public use microdata: An illustration and empirical study. Journal of the Royal Statistical Society, Series A: Statistics in Society 168 (1), 185–205.<br/><br/>Reiter, J.P., 2005c. Using CART to generate partially synthetic public use microdata. Journal of Official Statistics 21 (3), 441–462. <br/><br/>Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.<br/><br/>Rubin, D.B., 1993. Comment on “Statistical disclosure limitation”. Journal of Official Statistics 9, 461–468.<br/><br/>Rubner, Y., Tomasi, C., Guibas, L.J., 1998. A metric for distributions with applications to image databases. Computer Vision, IEEE International Conference on 0, 59.<br/><br/>Sarathy, R., Muralidhar, K., 2002a. The security of confidential numerical data in databases. Information Systems Research 13 (4), 389–403.<br/><br/>Sarathy, R., Muralidhar, K., 2002b. The security of confidential numerical data in databases. Info. Sys. Research 13 (4), 389–403.<br/><br/>Schafer, J.L., Graham, J.W., 2002. Missing data: Our view of state of the art. Psychological Methods 7 (2), 147–177.<br/><br/>Singh, A., Yu, F., Dunteman, G., 2003. MASSC: A new data mask for limiting statistical information loss and disclosure. In: Proceedings of the Joint UNECE/EUROSTAT Work Session on Statistical Data Confidentiality. pp. 373–394.<br/><br/>Skinner, C., 2009. Statistical disclosure control for survey data. In: Pfeffermann, D and Rao, C.R. eds. Handbook of Statistics Vol. 29A: Sample Surveys: Design, Methods and Applications. pp. 381–396.<br/><br/>Skinner, C., Marsh, C., Openshaw, S., Wymer, C., 1994. Disclosure control for census microdata. Journal of Official Statistics 10, 31–51.<br/><br/>Skinner, C., Shlomo, N., 2008. Assessing identification risk in survey microdata using log-linear models. Journal of the American Statistical Association 103, 989–1001.<br/><br/>Skinner, C.J., Elliot, M.J., 2002. A measure of disclosure risk for microdata. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64 (4), 855–867.<br/><br/>Smith, A., 2008. Efficient, dfferentially private point estimators. arXiv:0809.4794v1 [cs.CR].<br/><br/>Spruill, N.L., 1982. Measures of confidentiality. Statistics of Income and Related Administrative Record Research, 131–136.<br/><br/>Spruill, N.L., 1983. The confidentiality and analytic usefulness of masked business microdata. In: Proceedings of the Section on Survey Reserach Microdata. American Statistical Association, pp. 602–607.<br/><br/>Sweeney, L., 1996. Replacing personally-identifying information in medical records, the scrub system. In: American Medical Informatics Association. Hanley and Belfus, Inc., pp. 333–337.<br/><br/>Sweeney, L., 1997. Guaranteeing anonymity when sharing medical data, the datafly system. Journal of the American Medical Informatics Association 4, 51–55.<br/><br/>Sweeney, L., 2002a. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 10 (5), 571–588. <br/><br/>Sweeney, L., 2002b. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge Based Systems 10 (5), 557–570.<br/><br/>Tendick, P., 1991. Optimal noise addition for preserving confidentiality in multivariate data. Journal of Statistical Planning and Inference 27 (2), 341–353.<br/><br/>United Nations Economic Comission for Europe (UNECE), 2007. Manging statistical cinfidentiality and microdata access: Principles and guidlinesof good practice.<br/><br/>Warner, S.L., 1965. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 60 (309), 63–69.<br/><br/>Wasserman, L., Zhou, S., 2010. A statistical framework for differential privacy. Journal of the American Statistical Association 105 (489), 375–389.<br/><br/>Willenborg, L., de Waal, T., 2001. Elements of Statistical Disclosure Control. Springer-Verlag.<br/><br/>Woodward, B., 1995. The computer-based patient record and confidentiality. The New England Journal of Medicine, 1419–1422.<br/><br/></p>projecteuclid.org/euclid.ssu/1296828958_Fri, 04 Feb 2011 09:16 ESTFri, 04 Feb 2011 09:16 ESTCurse of dimensionality and related issues in nonparametric functional regressionhttp://projecteuclid.org/euclid.ssu/1302783447<strong>Gery Geenens</strong><p><strong>Source: </strong>Statist. Surv., Volume 5, 30--43.</p><p><strong>Abstract:</strong><br/>
Recently, some nonparametric regression ideas have been extended to the case of functional regression. Within that framework, the main concern arises from the infinite dimensional nature of the explanatory objects. Specifically, in the classical multivariate regression context, it is well-known that any nonparametric method is affected by the so-called “curse of dimensionality”, caused by the sparsity of data in high-dimensional spaces, resulting in a decrease in fastest achievable rates of convergence of regression function estimators toward their target curve as the dimension of the regressor vector increases. Therefore, it is not surprising to find dramatically bad theoretical properties for the nonparametric functional regression estimators, leading many authors to condemn the methodology. Nevertheless, a closer look at the meaning of the functional data under study and on the conclusions that the statistician would like to draw from it allows to consider the problem from another point-of-view, and to justify the use of slightly modified estimators. In most cases, it can be entirely legitimate to measure the proximity between two elements of the infinite dimensional functional space via a semi-metric, which could prevent those estimators suffering from what we will call the “curse of infinite dimensionality”.
</p><p><strong>References:</strong><br/>[1] Ait-Saïdi, A., Ferraty, F., Kassa, K. and Vieu, P. (2008). Cross-validated estimations in the single-functional index model, Statistics, 42, 475–494.<br/><br/>[2] Aneiros-Perez, G. and Vieu, P. (2008). Nonparametric time series prediction: A semi-functional partial linear modeling, J. Multivariate Anal., 99, 834–857.<br/><br/>[3] Baillo, A. and Grané, A. (2009). Local linear regression for functional predictor and scalar response, J. Multivariate Anal., 100, 102–111.<br/><br/>[4] Burba, F., Ferraty, F. and Vieu, P. (2009). <i>k</i>-Nearest Neighbour method in functional nonparametric regression, J. Nonparam. Stat., 21, 453–469.<br/><br/>[5] Cardot, H., Ferraty, F. and Sarda, P. (1999). Functional linear model, Stat. Probabil. Lett., 45, 11–22.<br/><br/>[6] Crambes, C., Kneip, A. and Sarda, P. (2009). Smoothing splines estimators for functional linear regression, Ann. Statist., 37, 35–72.<br/><br/>[7] Delsol, L. (2009). Advances on asymptotic normality in nonparametric functional time series analysis, Statistics, 43, 13–33.<br/><br/>[8] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications, Chapman and Hall, London.<br/><br/>[9] Fan, J. and Zhang, J.-T. (2000). Two-step estimation of functional linear models with application to longitudinal data, J. Roy. Stat. Soc. B, 62, 303–322.<br/><br/>[10] Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis, Springer-Verlag, New York.<br/><br/>[11] Ferraty, F., Laksaci, A. and Vieu, P. (2006). Estimating Some Characteristics of the Conditional Distribution in Nonparametric Functional Models, Statist. Inf. Stoch. Proc., 9, 47–76.<br/><br/>[12] Ferraty, F., Mas, A. and Vieu, P. (2007). Nonparametric regression on functional data: inference and practical aspects, Aust. NZ. J. Stat., 49, 267–286.<br/><br/>[13] Ferraty, F., Van Keilegom, I. and Vieu, P. (2010). On the validity of the bootstrap in nonparametric functional regression, Scand. J. Stat., 37, 286–306.<br/><br/>[14] Ferraty, F., Laksaci, A., Tadj, A. and Vieu, P. (2010). Rate of uniform consistency for nonparametric estimates with functional variables, J. Stat. Plan. Inf., 140, 335–352.<br/><br/>[15] Ferraty, F. and Romain, Y. (2011). Oxford handbook on functional data analysis (Eds), Oxford University Press.<br/><br/>[16] Gasser, T., Hall, P. and Presnell, B. (1998). Nonparametric estimation of the mode of a distribution of random curves, J. Roy. Stat. Soc. B, 60, 681–691.<br/><br/>[17] Geenens, G. (2011). A nonparametric functional method for signature recognition, <i>Manuscript</i>.<br/><br/>[18] Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2004). Nonparametric and semiparametric models, Springer-Verlag, Berlin.<br/><br/>[19] James, G.M. (2002). Generalized linear models with functional predictors, J. Roy. Stat. Soc. B, 64, 411–432.<br/><br/>[20] Masry, E. (2005). Nonparametric regression estimation for dependent functional data: asymptotic normality, Stochastic Process. Appl., 115, 155–177.<br/><br/>[21] Nadaraya, E.A. (1964). On estimating regression, Theory Probab. Applic., 9, 141–142.<br/><br/>[22] Quintela-Del-Rio, A. (2008). Hazard function given a functional variable: nonparametric estimation under strong mixing conditions, J. Nonparam. Stat., 20, 413–430.<br/><br/>[23] Rachdi, M. and Vieu, P. (2007). Nonparametric regression for functional data: automatic smoothing parameter selection, J. Stat. Plan. Inf., 137, 2784–2801.<br/><br/>[24] Ramsay, J. and Silverman, B.W. (1997). Functional Data Analysis, Springer-Verlag, New York.<br/><br/>[25] Ramsay, J. and Silverman, B.W. (2002). Applied functional data analysis; methods and case study, Springer-Verlag, New York.<br/><br/>[26] Ramsay, J. and Silverman, B.W. (2005). Functional Data Analysis, 2nd Edition, Springer-Verlag, New York.<br/><br/>[27] Stone, C.J. (1982). Optimal global rates of convergence for nonparametric regression, Ann. Stat., 10, 1040–1053.<br/><br/>[28] Watson, G.S. (1964). Smooth regression analysis, Sankhya A, 26, 359–372.<br/><br/>[29] Yeung, D.T., Chang, H., Xiong, Y., George, S., Kashi, R., Matsumoto, T. and Rigoll, G. (2004). SVC2004: First International Signature Verification Competition, Proceedings of the International Conference on Biometric Authentication (ICBA), Hong Kong, July 2004.<br/><br/></p>projecteuclid.org/euclid.ssu/1302783447_Thu, 14 Apr 2011 08:17 EDTThu, 14 Apr 2011 08:17 EDTA review of survival treeshttp://projecteuclid.org/euclid.ssu/1315833185<strong>Imad Bou-Hamad</strong>, <strong>Denis Larocque</strong>, <strong>Hatem Ben-Ameur</strong><p><strong>Source: </strong>Statist. Surv., Volume 5, 44--71.</p><p><strong>Abstract:</strong><br/>
This paper presents a non–technical account of the developments in tree–based methods for the analysis of survival data with censoring. This review describes the initial developments, which mainly extended the existing basic tree methodologies to censored data as well as to more recent work. We also cover more complex models, more specialized methods, and more specific problems such as multivariate data, the use of time–varying covariates, discrete–scale survival data, and ensemble methods applied to survival trees. A data example is used to illustrate some methods that are implemented in R.
</p>projecteuclid.org/euclid.ssu/1315833185_Mon, 12 Sep 2011 09:13 EDTMon, 12 Sep 2011 09:13 EDTPrediction in several conventional contextshttp://projecteuclid.org/euclid.ssu/1336481369<strong>Bertrand Clarke</strong>, <strong>Jennifer Clarke</strong><p><strong>Source: </strong>Statist. Surv., Volume 6, 1--73.</p><p><strong>Abstract:</strong><br/>
We review predictive techniques from several traditional branches of statistics. Starting with prediction based on the normal model and on the empirical distribution function, we proceed to techniques for various forms of regression and classification. Then, we turn to time series, longitudinal data, and survival analysis. Our focus throughout is on the mechanics of prediction more than on the properties of predictors.
</p>projecteuclid.org/euclid.ssu/1336481369_Tue, 08 May 2012 08:50 EDTTue, 08 May 2012 08:50 EDTStatistical inference for disordered sphere packingshttp://projecteuclid.org/euclid.ssu/1342701400<strong>Jeffrey Picka</strong><p><strong>Source: </strong>Statist. Surv., Volume 6, 74--112.</p><p><strong>Abstract:</strong><br/>
This paper gives an overview of statistical inference for disordered sphere packing processes. These processes are used extensively in physics and engineering in order to represent the internal structure of composite materials, packed bed reactors, and powders at rest, and are used as initial arrangements of grains in the study of avalanches and other problems involving powders in motion. Packing processes are spatial processes which are neither stationary nor ergodic. Classical spatial statistical models and procedures cannot be applied to these processes, but alternative models and procedures can be developed based on ideas from statistical physics.
Most of the development of models and statistics for sphere packings has been undertaken by scientists and engineers. This review summarizes their results from an inferential perspective.
</p>projecteuclid.org/euclid.ssu/1342701400_Thu, 19 Jul 2012 08:37 EDTThu, 19 Jul 2012 08:37 EDTThe theory and application of penalized methods or Reproducing Kernel Hilbert Spaces made easyhttp://projecteuclid.org/euclid.ssu/1350394596<strong>Nancy Heckman</strong><p><strong>Source: </strong>Statist. Surv., Volume 6, 113--141.</p><p><strong>Abstract:</strong><br/>
The popular cubic smoothing spline estimate of a regression function arises as the minimizer of the penalized sum of squares $\sum_{j}(Y_{j}-\mu(t_{j}))^{2}+\lambda \int_{a}^{b}[\mu''(t)]^{2}\,dt$, where the data are $t_{j},Y_{j}$, $j=1,\ldots,n$. The minimization is taken over an infinite-dimensional function space, the space of all functions with square integrable second derivatives. But the calculations can be carried out in a finite-dimensional space. The reduction from minimizing over an infinite dimensional space to minimizing over a finite dimensional space occurs for more general objective functions: the data may be related to the function $\mu$ in another way, the sum of squares may be replaced by a more suitable expression, or the penalty, $\int_{a}^{b}[\mu''(t)]^{2}\,dt$, might take a different form. This paper reviews the Reproducing Kernel Hilbert Space structure that provides a finite-dimensional solution for a general minimization problem. Particular attention is paid to the construction and study of the Reproducing Kernel Hilbert Space corresponding to a penalty based on a linear differential operator. In this case, one can often calculate the minimizer explicitly, using Green’s functions.
</p>projecteuclid.org/euclid.ssu/1350394596_Tue, 16 Oct 2012 09:36 EDTTue, 16 Oct 2012 09:36 EDTA survey of Bayesian predictive methods for model assessment, selection and comparisonhttp://projecteuclid.org/euclid.ssu/1356628931<strong>Aki Vehtari</strong>, <strong>Janne Ojanen</strong><p><strong>Source: </strong>Statist. Surv., Volume 6, 142--228.</p><p><strong>Abstract:</strong><br/>
To date, several methods exist in the statistical literature for model assessment, which purport themselves specifically as Bayesian predictive methods. The decision theoretic assumptions on which these methods are based are not always clearly stated in the original articles, however. The aim of this survey is to provide a unified review of Bayesian predictive model assessment and selection methods, and of methods closely related to them. We review the various assumptions that are made in this context and discuss the connections between different approaches, with an emphasis on how each method approximates the expected utility of using a Bayesian model for the purpose of predicting future data.
</p>projecteuclid.org/euclid.ssu/1356628931_Thu, 27 Dec 2012 12:22 ESTThu, 27 Dec 2012 12:22 ESTAnalyzing complex functional brain networks: Fusing statistics and network science to understand the brainhttp://projecteuclid.org/euclid.ssu/1382965566<strong>Sean L. Simpson</strong>, <strong>F. DuBois Bowman</strong>, <strong>Paul J. Laurienti</strong><p><strong>Source: </strong>Statist. Surv., Volume 7, 1--36.</p><p><strong>Abstract:</strong><br/>
Complex functional brain network analyses have exploded over the last decade, gaining traction due to their profound clinical implications. The application of network science (an interdisciplinary offshoot of graph theory) has facilitated these analyses and enabled examining the brain as an integrated system that produces complex behaviors. While the field of statistics has been integral in advancing activation analyses and some connectivity analyses in functional neuroimaging research, it has yet to play a commensurate role in complex network analyses. Fusing novel statistical methods with network-based functional neuroimage analysis will engender powerful analytical tools that will aid in our understanding of normal brain function as well as alterations due to various brain disorders. Here we survey widely used statistical and network science tools for analyzing fMRI network data and discuss the challenges faced in filling some of the remaining methodological gaps. When applied and interpreted correctly, the fusion of network scientific and statistical methods has a chance to revolutionize the understanding of brain function.
</p>projecteuclid.org/euclid.ssu/1382965566_Mon, 28 Oct 2013 09:06 EDTMon, 28 Oct 2013 09:06 EDTErrata: A survey of Bayesian predictive methods for model assessment, selection and comparisonhttp://projecteuclid.org/euclid.ssu/1393423808<strong>Aki Vehtari</strong>, <strong>Janne Ojanen</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 8, , 1--1.</p><p><strong>Abstract:</strong><br/>
Errata for “A survey of Bayesian predictive methods for model assessment, selection and comparison” by A. Vehtari and J. Ojanen, Statistics Surveys , 6 (2012), 142–228. doi:10.1214/12-SS102.
</p>projecteuclid.org/euclid.ssu/1393423808_20140226091022Wed, 26 Feb 2014 09:10 ESTAdaptive clinical trial designs for phase I cancer studieshttp://projecteuclid.org/euclid.ssu/1401369114<strong>Oleksandr Sverdlov</strong>, <strong>Weng Kee Wong</strong>, <strong>Yevgen Ryeznik</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 8, 2--44.</p><p><strong>Abstract:</strong><br/>
Adaptive clinical trials are becoming increasingly popular research designs for clinical investigation. Adaptive designs are particularly useful in phase I cancer studies where clinical data are scant and the goals are to assess the drug dose-toxicity profile and to determine the maximum tolerated dose while minimizing the number of study patients treated at suboptimal dose levels.
In the current work we give an overview of adaptive design methods for phase I cancer trials. We find that modern statistical literature is replete with novel adaptive designs that have clearly defined objectives and established statistical properties, and are shown to outperform conventional dose finding methods such as the 3+3 design, both in terms of statistical efficiency and in terms of minimizing the number of patients treated at highly toxic or nonefficacious doses. We discuss statistical, logistical, and regulatory aspects of these designs and present some links to non-commercial statistical software for implementing these methods in practice.
</p>projecteuclid.org/euclid.ssu/1401369114_20140529091158Thu, 29 May 2014 09:11 EDTLog-concavity and strong log-concavity: A reviewhttp://projecteuclid.org/euclid.ssu/1418134163<strong>Adrien Saumard</strong>, <strong>Jon A. Wellner</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 8, 45--114.</p><p><strong>Abstract:</strong><br/>
We review and formulate results concerning log-concavity and strong-log-concavity in both discrete and continuous settings. We show how preservation of log-concavity and strong log-concavity on $\mathbb{R}$ under convolution follows from a fundamental monotonicity result of Efron (1965). We provide a new proof of Efron’s theorem using the recent asymmetric Brascamp-Lieb inequality due to Otto and Menz (2013). Along the way we review connections between log-concavity and other areas of mathematics and statistics, including concentration of measure, log-Sobolev inequalities, convex geometry, MCMC algorithms, Laplace approximations, and machine learning.
</p>projecteuclid.org/euclid.ssu/1418134163_20141209090925Tue, 09 Dec 2014 09:09 ESTSemi-parametric estimation for conditional independence multivariate finite mixture modelshttp://projecteuclid.org/euclid.ssu/1423229941<strong>Didier Chauveau</strong>, <strong>David R. Hunter</strong>, <strong>Michael Levine</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 9, 1--31.</p><p><strong>Abstract:</strong><br/>
The conditional independence assumption for nonparametric multivariate finite mixture models, a weaker form of the well-known conditional independence assumption for random effects models for longitudinal data, is the subject of an increasing number of theoretical and algorithmic developments in the statistical literature. After presenting a survey of this literature, including an in-depth discussion of the all-important identifiability results, this article describes and extends an algorithm for estimation of the parameters in these models. The algorithm works for any number of components in three or more dimensions. It possesses a descent property and can be easily adapted to situations where the data are grouped in blocks of conditionally independent variables. We discuss how to adapt this algorithm to various location-scale models that link component densities, and we even adapt it to a particular class of univariate mixture problems in which the components are assumed symmetric. We give a bandwidth selection procedure for our algorithm. Finally, we demonstrate the effectiveness of our algorithm using a simulation study and two psychometric datasets.
</p>projecteuclid.org/euclid.ssu/1423229941_20150206083903Fri, 06 Feb 2015 08:39 EST$M$-functionals of multivariate scatterhttp://projecteuclid.org/euclid.ssu/1426857094<strong>Lutz Dümbgen</strong>, <strong>Markus Pauly</strong>, <strong>Thomas Schweizer</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 9, 32--105.</p><p><strong>Abstract:</strong><br/>
This survey provides a self-contained account of $M$-estimation of multivariate scatter. In particular, we present new proofs for existence of the underlying $M$-functionals and discuss their weak continuity and differentiability. This is done in a rather general framework with matrix-valued random variables. By doing so we reveal a connection between Tyler’s (1987a) $M$-functional of scatter and the estimation of proportional covariance matrices. Moreover, this general framework allows us to treat a new class of scatter estimators, based on symmetrizations of arbitrary order. Finally these results are applied to $M$-estimation of multivariate location and scatter via multivariate $t$-distributions.
</p>projecteuclid.org/euclid.ssu/1426857094_20150320091136Fri, 20 Mar 2015 09:11 EDTSome models and methods for the analysis of observational datahttp://projecteuclid.org/euclid.ssu/1442364037<strong>José A. Ferreira</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 9, 106--208.</p><p><strong>Abstract:</strong><br/>
This article provides a concise and essentially self-contained exposition of some of the most important models and non-parametric methods for the analysis of observational data, and a substantial number of illustrations of their application. Although for the most part our presentation follows P. Rosenbaum’s book, “Observational Studies”, and naturally draws on related literature, it contains original elements and simplifies and generalizes some basic results. The illustrations, based on simulated data, show the methods at work in some detail, highlighting pitfalls and emphasizing certain subjective aspects of the statistical analyses.
</p>projecteuclid.org/euclid.ssu/1442364037_20150915204040Tue, 15 Sep 2015 20:40 EDTStatistical inference for dynamical systems: A reviewhttp://projecteuclid.org/euclid.ssu/1447165229<strong>Kevin McGoff</strong>, <strong>Sayan Mukherjee</strong>, <strong>Natesh Pillai</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 9, 209--252.</p><p><strong>Abstract:</strong><br/>
The topic of statistical inference for dynamical systems has been studied widely across several fields. In this survey we focus on methods related to parameter estimation for nonlinear dynamical systems. Our objective is to place results across distinct disciplines in a common setting and highlight opportunities for further research.
</p>projecteuclid.org/euclid.ssu/1447165229_20151110092034Tue, 10 Nov 2015 09:20 ESTA unified treatment for non-asymptotic and asymptotic approaches to minimax signal detectionhttp://projecteuclid.org/euclid.ssu/1453212290<strong>Clément Marteau</strong>, <strong>Theofanis Sapatinas</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 9, 253--297.</p><p><strong>Abstract:</strong><br/>
We are concerned with minimax signal detection. In this setting, we discuss non-asymptotic and asymptotic approaches through a unified treatment. In particular, we consider a Gaussian sequence model that contains classical models as special cases, such as, direct, well-posed inverse and ill-posed inverse problems. Working with certain ellipsoids in the space of squared-summable sequences of real numbers, with a ball of positive radius removed, we compare the construction of lower and upper bounds for the minimax separation radius (non-asymptotic approach) and the minimax separation rate (asymptotic approach) that have been proposed in the literature. Some additional contributions, bringing to light links between non-asymptotic and asymptotic approaches to minimax signal, are also presented. An example of a mildly ill-posed inverse problem is used for illustrative purposes. In particular, it is shown that tools used to derive ‘asymptotic’ results can be exploited to draw ‘non-asymptotic’ conclusions, and vice-versa.
In order to enhance our understanding of these two minimax signal detection paradigms, we bring into light hitherto unknown similarities and links between non-asymptotic and asymptotic approaches.
</p>projecteuclid.org/euclid.ssu/1453212290_20160119090454Tue, 19 Jan 2016 09:04 ESTA survey of bootstrap methods in finite population samplinghttp://projecteuclid.org/euclid.ssu/1458047831<strong>Zeinab Mashreghi</strong>, <strong>David Haziza</strong>, <strong>Christian Léger</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 10, 1--52.</p><p><strong>Abstract:</strong><br/>
We review bootstrap methods in the context of survey data where the effect of the sampling design on the variability of estimators has to be taken into account. We present the methods in a unified way by classifying them in three classes: pseudo-population, direct, and survey weights methods. We cover variance estimation and the construction of confidence intervals for stratified simple random sampling as well as some unequal probability sampling designs. We also address the problem of variance estimation in presence of imputation to compensate for item non-response.
</p>projecteuclid.org/euclid.ssu/1458047831_20160315091713Tue, 15 Mar 2016 09:17 EDTFundamentals of cone regressionhttp://projecteuclid.org/euclid.ssu/1463663054<strong>Mariella Dimiccoli</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 10, 53--99.</p><p><strong>Abstract:</strong><br/>
Cone regression is a particular case of quadratic programming that minimizes a weighted sum of squared residuals under a set of linear inequality constraints. Several important statistical problems such as isotonic, concave regression or ANOVA under partial orderings, just to name a few, can be considered as particular instances of the cone regression problem. Given its relevance in Statistics, this paper aims to address the fundamentals of cone regression from a theoretical and practical point of view. Several formulations of the cone regression problem are considered and, focusing on the particular case of concave regression as an example, several algorithms are analyzed and compared both qualitatively and quantitatively through numerical simulations. Several improvements to enhance numerical stability and bound the computational cost are proposed. For each analyzed algorithm, the pseudo-code and its corresponding code in Matlab are provided. The results from this study demonstrate that the choice of the optimization approach strongly impacts the numerical performances. It is also shown that methods are not currently available to solve efficiently cone regression problems with large dimension (more than many thousands of points). We suggest further research to fill this gap by exploiting and adapting classical multi-scale strategy to compute an approximate solution.
</p>projecteuclid.org/euclid.ssu/1463663054_20160519090416Thu, 19 May 2016 09:04 EDTA comparison of spatial predictors when datasets could be very largehttp://projecteuclid.org/euclid.ssu/1468952015<strong>Jonathan R. Bradley</strong>, <strong>Noel Cressie</strong>, <strong>Tao Shi</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 10, 100--131.</p><p><strong>Abstract:</strong><br/>
In this article, we review and compare a number of methods of spatial prediction, where each method is viewed as an algorithm that processes spatial data. To demonstrate the breadth of available choices, we consider both traditional and more-recently-introduced spatial predictors. Specifically, in our exposition we review: traditional stationary kriging, smoothing splines, negative-exponential distance-weighting, fixed rank kriging, modified predictive processes, a stochastic partial differential equation approach, and lattice kriging. This comparison is meant to provide a service to practitioners wishing to decide between spatial predictors. Hence, we provide technical material for the unfamiliar, which includes the definition and motivation for each (deterministic and stochastic) spatial predictor. We use a benchmark dataset of $\mathrm{CO}_{2}$ data from NASA’s AIRS instrument to address computational efficiencies that include CPU time and memory usage. Furthermore, the predictive performance of each spatial predictor is assessed empirically using a hold-out subset of the AIRS data.
</p>projecteuclid.org/euclid.ssu/1468952015_20160719141340Tue, 19 Jul 2016 14:13 EDTMeasuring multivariate association and beyondhttp://projecteuclid.org/euclid.ssu/1479351622<strong>Julie Josse</strong>, <strong>Susan Holmes</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 10, 132--167.</p><p><strong>Abstract:</strong><br/>
Simple correlation coefficients between two variables have been generalized to measure association between two matrices in many ways. Coefficients such as the RV coefficient, the distance covariance (dCov) coefficient and kernel based coefficients are being used by different research communities. Scientists use these coefficients to test whether two random vectors are linked. Once it has been ascertained that there is such association through testing, then a next step, often ignored, is to explore and uncover the association’s underlying patterns.
This article provides a survey of various measures of dependence between random vectors and tests of independence and emphasizes the connections and differences between the various approaches. After providing definitions of the coefficients and associated tests, we present the recent improvements that enhance their statistical properties and ease of interpretation. We summarize multi-table approaches and provide scenarii where the indices can provide useful summaries of heterogeneous multi-block data. We illustrate these different strategies on several examples of real data and suggest directions for future research.
</p>projecteuclid.org/euclid.ssu/1479351622_20161116220034Wed, 16 Nov 2016 22:00 ESTBasic models and questions in statistical network analysishttps://projecteuclid.org/euclid.ssu/1504836152<strong>Miklós Z. Rácz</strong>, <strong>Sébastien Bubeck</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 11, 1--47.</p><p><strong>Abstract:</strong><br/>
Extracting information from large graphs has become an important statistical problem since network data is now common in various fields. In this minicourse we will investigate the most natural statistical questions for three canonical probabilistic models of networks: (i) community detection in the stochastic block model, (ii) finding the embedding of a random geometric graph, and (iii) finding the original vertex in a preferential attachment tree. Along the way we will cover many interesting topics in probability theory such as Pólya urns, large deviation theory, concentration of measure in high dimension, entropic central limit theorems, and more.
</p>projecteuclid.org/euclid.ssu/1504836152_20170907220234Thu, 07 Sep 2017 22:02 EDTA design-sensitive approach to fitting regression models with complex survey datahttps://projecteuclid.org/euclid.ssu/1516179619<strong>Phillip S. Kott</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 12, 1--17.</p><p><strong>Abstract:</strong><br/>
Fitting complex survey data to regression equations is explored under a design-sensitive model-based framework. A robust version of the standard model assumes that the expected value of the difference between the dependent variable and its model-based prediction is zero no matter what the values of the explanatory variables. The extended model assumes only that the difference is uncorrelated with the covariates. Little is assumed about the error structure of this difference under either model other than independence across primary sampling units. The standard model often fails in practice, but the extended model very rarely does. Under this framework some of the methods developed in the conventional design-based, pseudo-maximum-likelihood framework, such as fitting weighted estimating equations and sandwich mean-squared-error estimation, are retained but their interpretations change. Few of the ideas here are new to the refereed literature. The goal instead is to collect those ideas and put them into a unified conceptual framework.
</p>projecteuclid.org/euclid.ssu/1516179619_20180117040020Wed, 17 Jan 2018 04:00 ESTVariable selection methods for model-based clusteringhttps://projecteuclid.org/euclid.ssu/1524729611<strong>Michael Fop</strong>, <strong>Thomas Brendan Murphy</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 12, 18--65.</p><p><strong>Abstract:</strong><br/>
Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for variable selection in model-based clustering. Existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples.
</p>projecteuclid.org/euclid.ssu/1524729611_20180426040016Thu, 26 Apr 2018 04:00 EDTAn approximate likelihood perspective on ABC methodshttps://projecteuclid.org/euclid.ssu/1528509818<strong>George Karabatsos</strong>, <strong>Fabrizio Leisen</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 12, 66--104.</p><p><strong>Abstract:</strong><br/>
We are living in the big data era, as current technologies and networks allow for the easy and routine collection of data sets in different disciplines. Bayesian Statistics offers a flexible modeling approach which is attractive for describing the complexity of these datasets. These models often exhibit a likelihood function which is intractable due to the large sample size, high number of parameters, or functional complexity. Approximate Bayesian Computational (ABC) methods provides likelihood-free methods for performing statistical inferences with Bayesian models defined by intractable likelihood functions. The vastity of the literature on ABC methods created a need to review and relate all ABC approaches so that scientists can more readily understand and apply them for their own work. This article provides a unifying review, general representation, and classification of all ABC methods from the view of approximate likelihood theory. This clarifies how ABC methods can be characterized, related, combined, improved, and applied for future research. Possible future research in ABC is then outlined.
</p>projecteuclid.org/euclid.ssu/1528509818_20180608220342Fri, 08 Jun 2018 22:03 EDTA review of dynamic network models with latent variableshttps://projecteuclid.org/euclid.ssu/1535961690<strong>Bomin Kim</strong>, <strong>Kevin H. Lee</strong>, <strong>Lingzhou Xue</strong>, <strong>Xiaoyue Niu</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 12, 105--135.</p><p><strong>Abstract:</strong><br/>
We present a selective review of statistical modeling of dynamic networks. We focus on models with latent variables, specifically, the latent space models and the latent class models (or stochastic blockmodels), which investigate both the observed features and the unobserved structure of networks. We begin with an overview of the static models, and then we introduce the dynamic extensions. For each dynamic model, we also discuss its applications that have been studied in the literature, with the data source listed in Appendix. Based on the review, we summarize a list of open problems and challenges in dynamic network modeling with latent variables.
</p>projecteuclid.org/euclid.ssu/1535961690_20180903040140Mon, 03 Sep 2018 04:01 EDTPitfalls of significance testing and $p$-value variability: An econometrics perspectivehttps://projecteuclid.org/euclid.ssu/1538618436<strong>Norbert Hirschauer</strong>, <strong>Sven Grüner</strong>, <strong>Oliver Mußhoff</strong>, <strong>Claudia Becker</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 12, 136--172.</p><p><strong>Abstract:</strong><br/>
Data on how many scientific findings are reproducible are generally bleak and a wealth of papers have warned against misuses of the $p$-value and resulting false findings in recent years. This paper discusses the question of what we can(not) learn from the $p$-value, which is still widely considered as the gold standard of statistical validity. We aim to provide a non-technical and easily accessible resource for statistical practitioners who wish to spot and avoid misinterpretations and misuses of statistical significance tests. For this purpose, we first classify and describe the most widely discussed (“classical”) pitfalls of significance testing, and review published work on these misuses with a focus on regression-based “confirmatory” study. This includes a description of the single-study bias and a simulation-based illustration of how proper meta-analysis compares to misleading significance counts (“vote counting”). Going beyond the classical pitfalls, we also use simulation to provide intuition that relying on the statistical estimate “$p$-value” as a measure of evidence without considering its sample-to-sample variability falls short of the mark even within an otherwise appropriate interpretation. We conclude with a discussion of the exigencies of informed approaches to statistical inference and corresponding institutional reforms.
</p>projecteuclid.org/euclid.ssu/1538618436_20181003220044Wed, 03 Oct 2018 22:00 EDTAdditive monotone regression in high and lower dimensionshttps://projecteuclid.org/euclid.ssu/1560996027<strong>Solveig Engebretsen</strong>, <strong>Ingrid K. Glad</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 13, 1--51.</p><p><strong>Abstract:</strong><br/>
In numerous problems where the aim is to estimate the effect of a predictor variable on a response, one can assume a monotone relationship. For example, dose-effect models in medicine are of this type. In a multiple regression setting, additive monotone regression models assume that each predictor has a monotone effect on the response. In this paper, we present an overview and comparison of very recent frequentist methods for fitting additive monotone regression models. Three of the methods we present can be used both in the high dimensional setting, where the number of parameters $p$ exceeds the number of observations $n$, and in the classical multiple setting where $1<p\leq n$. However, many of the most recent methods only apply to the classical setting. The methods are compared through simulation experiments in terms of efficiency, prediction error and variable selection properties in both settings, and they are applied to the Boston housing data. We conclude with some recommendations on when the various methods perform best.
</p>projecteuclid.org/euclid.ssu/1560996027_20190619220041Wed, 19 Jun 2019 22:00 EDTHalfspace depth and floating bodyhttps://projecteuclid.org/euclid.ssu/1561169006<strong>Stanislav Nagy</strong>, <strong>Carsten Schütt</strong>, <strong>Elisabeth M. Werner</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 13, 52--118.</p><p><strong>Abstract:</strong><br/>
Little known relations of the renown concept of the halfspace depth for multivariate data with notions from convex and affine geometry are discussed. Maximum halfspace depth may be regarded as a measure of symmetry for random vectors. As such, the maximum depth stands as a generalization of a measure of symmetry for convex sets, well studied in geometry. Under a mild assumption, the upper level sets of the halfspace depth coincide with the convex floating bodies of measures used in the definition of the affine surface area for convex bodies in Euclidean spaces. These connections enable us to partially resolve some persistent open problems regarding theoretical properties of the depth.
</p>projecteuclid.org/euclid.ssu/1561169006_20190621220337Fri, 21 Jun 2019 22:03 EDTPLS for Big Data: A unified parallel algorithm for regularised group PLShttps://projecteuclid.org/euclid.ssu/1567411220<strong>Pierre Lafaye de Micheaux</strong>, <strong>Benoît Liquet</strong>, <strong>Matthew Sutton</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 13, 119--149.</p><p><strong>Abstract:</strong><br/>
Partial Least Squares (PLS) methods have been heavily exploited to analyse the association between two blocks of data. These powerful approaches can be applied to data sets where the number of variables is greater than the number of observations and in the presence of high collinearity between variables. Different sparse versions of PLS have been developed to integrate multiple data sets while simultaneously selecting the contributing variables. Sparse modeling is a key factor in obtaining better estimators and identifying associations between multiple data sets. The cornerstone of the sparse PLS methods is the link between the singular value decomposition (SVD) of a matrix (constructed from deflated versions of the original data) and least squares minimization in linear regression. We review four popular PLS methods for two blocks of data. A unified algorithm is proposed to perform all four types of PLS including their regularised versions. We present various approaches to decrease the computation time and show how the whole procedure can be scalable to big data sets. The bigsgPLS R package implements our unified algorithm and is available at https://github.com/matt-sutton/bigsgPLS .
</p>projecteuclid.org/euclid.ssu/1567411220_20190902040032Mon, 02 Sep 2019 04:00 EDTScalar-on-function regression for predicting distal outcomes from intensively gathered longitudinal data: Interpretability for applied scientistshttps://projecteuclid.org/euclid.ssu/1573009381<strong>John J. Dziak</strong>, <strong>Donna L. Coffman</strong>, <strong>Matthew Reimherr</strong>, <strong>Justin Petrovich</strong>, <strong>Runze Li</strong>, <strong>Saul Shiffman</strong>, <strong>Mariya P. Shiyko</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 13, 150--180.</p><p><strong>Abstract:</strong><br/>
Researchers are sometimes interested in predicting a distal or external outcome (such as smoking cessation at follow-up) from the trajectory of an intensively recorded longitudinal variable (such as urge to smoke). This can be done in a semiparametric way via scalar-on-function regression. However, the resulting fitted coefficient regression function requires special care for correct interpretation, as it represents the joint relationship of time points to the outcome, rather than a marginal or cross-sectional relationship. We provide practical guidelines, based on experience with scientific applications, for helping practitioners interpret their results and illustrate these ideas using data from a smoking cessation study.
</p>projecteuclid.org/euclid.ssu/1573009381_20191105220306Tue, 05 Nov 2019 22:03 ESTEstimating the size of a hidden finite set: Large-sample behavior of estimatorshttps://projecteuclid.org/euclid.ssu/1578106918<strong>Si Cheng</strong>, <strong>Daniel J. Eck</strong>, <strong>Forrest W. Crawford</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 14, 1--31.</p><p><strong>Abstract:</strong><br/>
A finite set is “hidden” if its elements are not directly enumerable or if its size cannot be ascertained via a deterministic query. In public health, epidemiology, demography, ecology and intelligence analysis, researchers have developed a wide variety of indirect statistical approaches, under different models for sampling and observation, for estimating the size of a hidden set. Some methods make use of random sampling with known or estimable sampling probabilities, and others make structural assumptions about relationships (e.g. ordering or network information) between the elements that comprise the hidden set. In this review, we describe models and methods for learning about the size of a hidden finite set, with special attention to asymptotic properties of estimators. We study the properties of these methods under two asymptotic regimes, “infill” in which the number of fixed-size samples increases, but the population size remains constant, and “outfill” in which the sample size and population size grow together. Statistical properties under these two regimes can be dramatically different.
</p>projecteuclid.org/euclid.ssu/1578106918_20200103220200Fri, 03 Jan 2020 22:02 ESTFlexible, boundary adapted, nonparametric methods for the estimation of univariate piecewise-smooth functionshttps://projecteuclid.org/euclid.ssu/1580806810<strong>Umberto Amato</strong>, <strong>Anestis Antoniadis</strong>, <strong>Italia De Feis</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 14, 32--70.</p><p><strong>Abstract:</strong><br/>
We present and compare some nonparametric estimation methods (wavelet and/or spline-based) designed to recover a one-dimensional piecewise-smooth regression function in both a fixed equidistant or not equidistant design regression model and a random design model.
Wavelet methods are known to be very competitive in terms of denoising and compression, due to the simultaneous localization property of a function in time and frequency. However, boundary assumptions, such as periodicity or symmetry, generate bias and artificial wiggles which degrade overall accuracy.
Simple methods have been proposed in the literature for reducing the bias at the boundaries. We introduce new ones based on adaptive combinations of two estimators. The underlying idea is to combine a highly accurate method for non-regular functions, e.g., wavelets, with one well behaved at boundaries, e.g., Splines or Local Polynomial. We provide some asymptotic optimal results supporting our approach. All the methods can handle data with a random design. We also sketch some generalization to the multidimensional setting.
To study the performance of the proposed approaches we have conducted an extensive set of simulations on synthetic data. An interesting regression analysis of two real data applications using these procedures unambiguously demonstrates their effectiveness.
</p>projecteuclid.org/euclid.ssu/1580806810_20200204040014Tue, 04 Feb 2020 04:00 ESTCan $p$-values be meaningfully interpreted without random sampling?https://projecteuclid.org/euclid.ssu/1585274548<strong>Norbert Hirschauer</strong>, <strong>Sven Grüner</strong>, <strong>Oliver Mußhoff</strong>, <strong>Claudia Becker</strong>, <strong>Antje Jantsch</strong>. <p><strong>Source: </strong>Statistics Surveys, Volume 14, 71--91.</p><p><strong>Abstract:</strong><br/>
Besides the inferential errors that abound in the interpretation of $p$-values, the probabilistic pre-conditions (i.e. random sampling or equivalent) for using them at all are not often met by observational studies in the social sciences. This paper systematizes different sampling designs and discusses the restrictive requirements of data collection that are the indispensable prerequisite for using $p$-values.
</p>projecteuclid.org/euclid.ssu/1585274548_20200326220231Thu, 26 Mar 2020 22:02 EDT