## Bernoulli

• Bernoulli
• Volume 21, Number 1 (2015), 83-143.

### Lasso and probabilistic inequalities for multivariate point processes

#### Abstract

Due to its low computational cost, Lasso is an attractive regularization method for high-dimensional statistical settings. In this paper, we consider multivariate counting processes depending on an unknown function parameter to be estimated by linear combinations of a fixed dictionary. To select coefficients, we propose an adaptive $\ell_{1}$-penalization methodology, where data-driven weights of the penalty are derived from new Bernstein type inequalities for martingales. Oracle inequalities are established under assumptions on the Gram matrix of the dictionary. Nonasymptotic probabilistic results for multivariate Hawkes processes are proven, which allows us to check these assumptions by considering general dictionaries based on histograms, Fourier or wavelet bases. Motivated by problems of neuronal activity inference, we finally carry out a simulation study for multivariate Hawkes processes and compare our methodology with the adaptive Lasso procedure proposed by Zou in (J. Amer. Statist. Assoc. 101 (2006) 1418–1429). We observe an excellent behavior of our procedure. We rely on theoretical aspects for the essential question of tuning our methodology. Unlike adaptive Lasso of (J. Amer. Statist. Assoc. 101 (2006) 1418–1429), our tuning procedure is proven to be robust with respect to all the parameters of the problem, revealing its potential for concrete purposes, in particular in neuroscience.

#### Article information

Source
Bernoulli, Volume 21, Number 1 (2015), 83-143.

Dates
First available in Project Euclid: 17 March 2015

https://projecteuclid.org/euclid.bj/1426597065

Digital Object Identifier
doi:10.3150/13-BEJ562

Mathematical Reviews number (MathSciNet)
MR3322314

Zentralblatt MATH identifier
1375.60092

#### Citation

Hansen, Niels Richard; Reynaud-Bouret, Patricia; Rivoirard, Vincent. Lasso and probabilistic inequalities for multivariate point processes. Bernoulli 21 (2015), no. 1, 83--143. doi:10.3150/13-BEJ562. https://projecteuclid.org/euclid.bj/1426597065

#### References

• [1] Aalen, O. (1980). A model for nonparametric regression analysis of counting processes. In Mathematical Statistics and Probability Theory (Proc. Sixth Internat. Conf., Wisła, 1978). Lecture Notes in Statist. 2 1–25. New York: Springer.
• [2] Andersen, P.K., Borgan, Ø., Gill, R.D. and Keiding, N. (1993). Statistical Models Based on Counting Processes. Springer Series in Statistics. New York: Springer.
• [3] Bacry, E., Delattre, S., Hoffmann, M. and Muzy, J.F. (2013). Some limit theorems for Hawkes processes and application to financial statistics. Stochastic Process. Appl. 123 2475–2499.
• [4] Bercu, B. and Touati, A. (2008). Exponential inequalities for self-normalized martingales with applications. Ann. Appl. Probab. 18 1848–1869.
• [5] Bertin, K., Le Pennec, E. and Rivoirard, V. (2011). Adaptive Dantzig density estimation. Ann. Inst. Henri Poincaré Probab. Stat. 47 43–74.
• [6] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [7] Bowsher, C.G. (2010). Stochastic kinetic models: Dynamic independence, modularity and graphs. Ann. Statist. 38 2242–2281.
• [8] Brémaud, P. (1981). Point Processes and Queues. New York: Springer.
• [9] Brémaud, P. and Massoulié, L. (1996). Stability of nonlinear Hawkes processes. Ann. Probab. 24 1563–1588.
• [10] Brette, R. and Destexhe, A., eds. (2012). Handbook of Neural Activity Measurement. Cambridge: Cambridge Univ. Press.
• [11] Brunel, E. and Comte, F. (2005). Penalized contrast estimation of density and hazard rate with censored data. Sankhyā 67 441–475.
• [12] Brunel, E. and Comte, F. (2008). Adaptive estimation of hazard rate with censored data. Comm. Statist. Theory Methods 37 1284–1305.
• [13] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-dimensional Data. Heidelberg: Springer.
• [14] Bunea, F. and McKeague, I.W. (2005). Covariate selection for semiparametric hazard function regression models. J. Multivariate Anal. 92 186–204.
• [15] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
• [16] Bunea, F., Tsybakov, A.B. and Wegkamp, M.H. (2006). Aggregation and sparsity via $l_{1}$ penalized least squares. In Learning Theory. Lecture Notes in Computer Science 4005 379–391. Berlin: Springer.
• [17] Bunea, F., Tsybakov, A.B. and Wegkamp, M.H. (2007). Sparse density estimation with $\ell_{1}$ penalties. In Learning Theory. Lecture Notes in Computer Science 4539 530–543. Berlin: Springer.
• [18] Bunea, F., Tsybakov, A.B. and Wegkamp, M.H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
• [19] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• [20] Carstensen, L., Sandelin, A., Winther, O. and Hansen, N.R. (2010). Multivariate Hawkes process models of the occurrence of regulatory elements and an analysis of the pilot ENCODE regions. BMC Bioinformatics 11 456.
• [21] Chagny, G. (2012). Adaptive warped kernel estimators. Available at http://hal.archives-ouvertes.fr/hal-00715184.
• [22] Chornoboy, E.S., Schramm, L.P. and Karr, A.F. (1988). Maximum likelihood identification of neural point process systems. Biol. Cybernet. 59 265–275.
• [23] Comte, F., Gaïffas, S. and Guilloux, A. (2011). Adaptive estimation of the conditional intensity of marker-dependent counting processes. Ann. Inst. Henri Poincaré Probab. Stat. 47 1171–1196.
• [24] Daley, D.J. and Vere-Jones, D. (2003). An Introduction to the Theory of Point Processes. Vol. I, 2nd ed. New York: Springer.
• [25] de la Peña, V.H. (1999). A general class of exponential inequalities for martingales and ratios. Ann. Probab. 27 537–564.
• [26] de la Peña, V.H., Lai, T.L. and Shao, Q.-M. (2009). Self-normalized Processes. Berlin: Springer.
• [27] Dzhaparidze, K. and van Zanten, J.H. (2001). On Bernstein-type inequalities for martingales. Stochastic Process. Appl. 93 109–117.
• [28] Fu, W.J. (1998). Penalized regressions: The bridge versus the lasso. J. Comput. Graph. Statist. 7 397–416.
• [29] Gaïffas, S. and Guilloux, A. (2012). High-dimensional additive hazards models and the Lasso. Electron. J. Stat. 6 522–546.
• [30] Grégoire, G. (1993). Least squares cross-validation for counting process intensities. Scand. J. Statist. 20 343–360.
• [31] Grün, S., Diesmann, M., Grammont, F., Riehle, A. and Aertsen, A. (1999). Detecting unitary events without discretization in time. J. Neurosci. Meth. 94 67–79.
• [32] Gusto, G. and Schbath, S. (2005). FADO: A statistical method to detect favored or avoided distances between occurrences of motifs using the Hawkes’ model. Stat. Appl. Genet. Mol. Biol. 4 Art. 24, 28 pp. (electronic).
• [33] Härdle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998). Wavelets, Approximation, and Statistical Applications. Lecture Notes in Statistics 129. New York: Springer.
• [34] Hawkes, A.G. (1971). Point spectra of some mutually exciting point processes. J. Roy. Statist. Soc. Ser. B 33 438–443.
• [35] Huang, J., Ma, S. and Zhang, C.-H. (2008). Adaptive Lasso for sparse high-dimensional regression models. Statist. Sinica 18 1603–1618.
• [36] Jacobsen, M. (2006). Point Process Theory and Applications. Marked Point and Piecewise Deterministic Processes. Boston, MA: Birkhäuser.
• [37] Koltchinskii, V., Lounici, K. and Tsybakov, A.B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
• [38] Krumin, M., Reutsky, I. and Shoham, S. (2010). Correlation-based analysis and generation of multiple spike trains using Hawkes models with an exogenous input. Front. Comp. Neurosci 4. 147.
• [39] Letue, F. (2000). Modèle de Cox: Estimation par sélection de modèle et modèle de chocs bivarié. Ph.D. thesis.
• [40] Liptser, R. and Spokoiny, V. (2000). Deviation probability bound for martingales with applications to statistical estimation. Statist. Probab. Lett. 46 347–357.
• [41] Massart, P. (2007). Concentration Inequalities and Model Selection. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 623, 2003. Lecture Notes in Math. 1896. Berlin: Springer.
• [42] Masud, M.S. and Borisyuk, R. (2011). Statistical technique for analysing functional connectivity of multiple spike trains. J. Neurosci. Meth. 196 201–219.
• [43] Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374–393.
• [44] Mitchell, L. and Cates, M.E. (2010). Hawkes process as a model of social interactions: A view on video dynamics. J. Phys. A 43 045101, 11.
• [45] Pernice, V., Staude, B., Cardanobile, S. and Rotter, S. (2011). How structure determines correlations in neuronal networks. PLoS Comput. Biol. 7 e1002059, 14.
• [46] Pernice, V., Staude, B., Cardanobile, S. and Rotter, S. (2012). Recurrent interactions in spiking networks with arbitrary topology. Phys. Rev. E 85 031916.
• [47] Pillow, J.W., Shlens, J., Paninski, L., Sher, A., Litke, A.M., Chichilnisky, E.J. and Simoncelli, E.P. (2008). Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature 454 995–999.
• [48] Reynaud-Bouret, P. (2003). Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Related Fields 126 103–153.
• [49] Reynaud-Bouret, P. (2006). Penalized projection estimators of the Aalen multiplicative intensity. Bernoulli 12 633–661.
• [50] Reynaud-Bouret, P. and Rivoirard, V. (2010). Near optimal thresholding estimation of a Poisson intensity on the real line. Electron. J. Stat. 4 172–238.
• [51] Reynaud-Bouret, P. and Roy, E. (2006). Some non asymptotic tail estimates for Hawkes processes. Bull. Belg. Math. Soc. Simon Stevin 13 883–896.
• [52] Reynaud-Bouret, P. and Schbath, S. (2010). Adaptive estimation for Hawkes processes; application to genome analysis. Ann. Statist. 38 2781–2822.
• [53] Reynaud-Bouret, P., Tuleau-Malot, C., Rivoirard, V. and Grammont, F. (2013). Spike trains as (in)homogeneous Poisson processes or Hawkes processes: Nonparametric adaptive estimation and goodness-of-fit tests. Available at http://hal.archives-ouvertes.fr/hal-00789127.
• [54] Rudelson, M. and Vershynin, R. (2008). On sparse reconstruction from Fourier and Gaussian measurements. Comm. Pure Appl. Math. 61 1025–1045.
• [55] Rudelson, M. and Vershynin, R. (2009). Smallest singular value of a random rectangular matrix. Comm. Pure Appl. Math. 62 1707–1739.
• [56] Rudelson, M. and Vershynin, R. (2010). Non-asymptotic theory of random matrices: Extreme singular values. In Proceedings of the International Congress of Mathematicians III 1576–1602. New Delhi: Hindustan Book Agency.
• [57] Shorack, G.R. and Wellner, J.A. (1986). Empirical Processes with Applications to Statistics. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. New York: Wiley.
• [58] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• [59] van de Geer, S. (1995). Exponential inequalities for martingales, with application to maximum likelihood estimation for counting processes. Ann. Statist. 23 1779–1801.
• [60] van de Geer, S., Bühlmann, P. and Zhou, S. (2011). The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron. J. Stat. 5 688–749.
• [61] van de Geer, S.A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
• [62] Vere-Jones, D. and Ozaki, T. (1982). Some examples of statistical estimation applied to earthquake data I: Cyclic Poisson and self-exciting models. Ann. I. Stat. Math. 34 189–207.
• [63] Willett, R.M. and Nowak, R.D. (2007). Multiscale Poisson intensity and density estimation. IEEE Trans. Inform. Theory 53 3171–3187.
• [64] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.