In this paper, we propose and analyze a model selection method for tree tensor networks in an empirical risk minimization framework and analyze its performance over a wide range of smoothness classes. Tree tensor networks, or tree-based tensor formats, are prominent model classes for the approximation of high-dimensional functions in numerical analysis and data science. They correspond to sum-product neural networks with a sparse connectivity associated with a dimension partition tree T, widths given by a tuple r of tensor ranks, and multilinear activation functions (or units). The approximation power of these model classes has been proved to be optimal (or near to optimal) for classical smoothness classes. However, in an empirical risk minimization framework with a limited number of observations, the dimension tree T and ranks r should be selected carefully to balance estimation and approximation errors. In this paper, we propose a complexity-based model selection strategy à la Barron, Birgé, Massart. Given a family of model classes associated with different trees, ranks, tensor product feature spaces and sparsity patterns for sparse tensor networks, a model is selected by minimizing a penalized empirical risk, with a penalty depending on the complexity of the model class. After deriving bounds of the metric entropy of tree tensor networks with bounded parameters, we deduce a form of the penalty from bounds on suprema of empirical processes. This choice of penalty yields a risk bound for the predictor associated with the selected model. In a least-squares setting, after deriving fast rates of convergence of the risk, we show that the proposed strategy is (near to) minimax adaptive to a wide range of smoothness classes including Sobolev or Besov spaces (with isotropic, anisotropic or mixed dominating smoothness) and analytic functions. We discuss the role of sparsity of the tensor network for obtaining optimal performance in several regimes. In practice, the amplitude of the penalty is calibrated with a slope heuristics method. Numerical experiments in a least-squares regression setting illustrate the performance of the strategy for the approximation of multivariate functions and univariate functions identified with tensors by tensorization (quantization).
The authors acknowledge AIRBUS Group for the financial support with the project AtRandom.
The authors would like to thank the anonymous referees, an Associate Editor and the Editor for their constructive comments that improved the quality of this paper.
"Learning with tree tensor networks: Complexity estimates and model selection." Bernoulli 28 (2) 910 - 936, May 2022. https://doi.org/10.3150/21-BEJ1371