## The Annals of Statistics

- Ann. Statist.
- Volume 45, Number 4 (2017), 1694-1727.

### Asymptotic and finite-sample properties of estimators based on stochastic gradients

Panos Toulis and Edoardo M. Airoldi

#### Abstract

Stochastic gradient descent procedures have gained popularity for parameter estimation from large data sets. However, their statistical properties are not well understood, in theory. And in practice, avoiding numerical instability requires careful tuning of key parameters. Here, we introduce implicit stochastic gradient descent procedures, which involve parameter updates that are implicitly defined. Intuitively, implicit updates shrink standard stochastic gradient descent updates. The amount of shrinkage depends on the observed Fisher information matrix, which does not need to be explicitly computed; thus, implicit procedures increase stability without increasing the computational burden. Our theoretical analysis provides the first full characterization of the asymptotic behavior of both standard and implicit stochastic gradient descent-based estimators, including finite-sample error bounds. Importantly, analytical expressions for the variances of these stochastic gradient-based estimators reveal their exact loss of efficiency. We also develop new algorithms to compute implicit stochastic gradient descent-based estimators for generalized linear models, Cox proportional hazards, M-estimators, in practice, and perform extensive experiments. Our results suggest that implicit stochastic gradient descent procedures are poised to become a workhorse for approximate inference from large data sets.

#### Article information

**Source**

Ann. Statist., Volume 45, Number 4 (2017), 1694-1727.

**Dates**

Received: September 2015

Revised: August 2016

First available in Project Euclid: 28 June 2017

**Permanent link to this document**

https://projecteuclid.org/euclid.aos/1498636871

**Digital Object Identifier**

doi:10.1214/16-AOS1506

**Mathematical Reviews number (MathSciNet)**

MR3670193

**Zentralblatt MATH identifier**

1378.62046

**Subjects**

Primary: 62L20: Stochastic approximation 62F10: Point estimation 62L12: Sequential estimation 62F12: Asymptotic properties of estimators 62F35: Robustness and adaptive procedures

**Keywords**

Stochastic approximation implicit updates asymptotic variance generalized linear models Cox proportional hazards M-estimation maximum likelihood exponential family statistical efficiency numerical stability

#### Citation

Toulis, Panos; Airoldi, Edoardo M. Asymptotic and finite-sample properties of estimators based on stochastic gradients. Ann. Statist. 45 (2017), no. 4, 1694--1727. doi:10.1214/16-AOS1506. https://projecteuclid.org/euclid.aos/1498636871

#### Supplemental materials

- Supplement to “Asymptotic and finite-sample properties of estimators based on stochastic gradients”. The proofs of all technical results are provided in an online supplement [Toulis and Airoldi (2017)]. There, we also provide numerical results that extend the results in Section 4 of this article—referred to as the “main paper” in the supplement.Digital Object Identifier: doi:10.1214/16-AOS1506SUPPSupplemental files are immediately available to subscribers. Non-subscribers gain access to supplemental files with the purchase of the article.