Risk bounds when learning infinitely many response functions by ordinary linear regression

Vincent Plassier; Francois Portier; Johan Segers

doi:10.1214/22-AIHP1259

Abstract

Consider the problem of learning a large number of response functions simultaneously based on the same input variables. The training data consist of a single independent random sample of the input variables drawn from a common distribution together with the associated responses. The input variables are mapped into a high-dimensional linear space, called the feature space, and the response functions are modelled as linear functionals of the mapped features, with coefficients calibrated via ordinary least squares. We provide convergence guarantees on the worst-case excess prediction risk by controlling the convergence rate of the excess risk uniformly in the response function. The dimension of the feature map is allowed to tend to infinity with the sample size. The collection of response functions, although potentially infinite, is supposed to have a finite Vapnik–Chervonenkis dimension. The bound derived can be applied when building multiple surrogate models in a reasonable computing time.

Nous considérons le problème de l’apprentissage simultané d’un grand nombre de fonctions de réponse en s’appuyant sur un échantillon d’entrées commun composé de variables aléatoires supposées indépendantes et identiquement distribuées. Ces données sont envoyées dans un espace de haute dimension, appelé espace caractéristique. Les fonctions de réponse sont approchées par des combinaisons linéaires de coordonnées de l’espace caractéristique dont les coefficients sont calibrés par la méthode des moindres carrés ordinaires. Nous fournissons des garanties de convergence sur l’excès de risque de prédiction en contrôlant le taux de convergence de l’excès de risque uniformément sur la classe de fonctions de réponse. La dimension de l’espace caractéristique peut tendre vers l’infini avec la taille de l’échantillon. Cependant, la collection de fonctions de réponse, bien que potentiellement infinie, est supposée avoir une dimension de Vapnik–Chervonenkis finie. Les bornes obtenues garantissent le bon fonctionnement de modèles de substitution pour remplacer une famille de modèles complexes dans l’optique de réduire les coûts numériques.

Acknowledgments and Disclosure of Funding

The authors gratefully acknowledge comments and suggestions by an anonymous Reviewer that stimulated us to sharpen the main theorem and work out the application to Monte Carlo methods. The authors also thank Rémi Leluc for his valuable feedback and Aigerim Zhuman for her insightful remarks. This project was supported financially by FNRS-F.R.S. grant CDR J.0146.19.

Citation

Download Citation

Vincent Plassier. Francois Portier. Johan Segers. "Risk bounds when learning infinitely many response functions by ordinary linear regression." Ann. Inst. H. Poincaré Probab. Statist. 59 (1) 53 - 78, February 2023. https://doi.org/10.1214/22-AIHP1259

Information

Received: 9 September 2020; Revised: 19 November 2021; Accepted: 14 March 2022; Published: February 2023

First available in Project Euclid: 16 January 2023

MathSciNet: MR4533720

zbMATH: 07657643

Digital Object Identifier: 10.1214/22-AIHP1259

Subjects:

Primary: 62G20 , 62J05

Keywords: control variates , Monte Carlo integration , multitask learning , ordinary least squares , Response surface model

Abstract

Acknowledgments and Disclosure of Funding

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS