Open Access
Translator Disclaimer
2021 Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit
Ke Li, Yun Yang, Naveen N. Narisetty
Author Affiliations +
Electron. J. Statist. 15(2): 5652-5695 (2021). DOI: 10.1214/21-EJS1909

Abstract

In this paper, we consider the multi-armed bandit problem with high-dimensional features. First, we prove a minimax lower bound, O((logd)α+12T1α2+logT), for the cumulative regret, in terms of horizon T, dimension d and a margin parameter α[0,1], which controls the separation between the optimal and the sub-optimal arms. This new lower bound unifies existing regret bound results that have different dependencies on T due to the use of different values of margin parameter α explicitly implied by their assumptions. Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy that achieves a regret upper bound matching the lower bound. The proposed algorithm uses a properly centered 1-ball as the confidence set in contrast to the commonly used ellipsoid confidence set. In addition, the algorithm does not require any forced sampling step and is thereby adaptive to the practically unknown margin parameter. Simulations and a real data analysis are conducted to compare the proposed method with existing ones in the literature.

Citation

Download Citation

Ke Li. Yun Yang. Naveen N. Narisetty. "Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit." Electron. J. Statist. 15 (2) 5652 - 5695, 2021. https://doi.org/10.1214/21-EJS1909

Information

Received: 1 December 2020; Published: 2021
First available in Project Euclid: 27 December 2021

Digital Object Identifier: 10.1214/21-EJS1909

Subjects:
Primary: 62L05

Keywords: Contextual linear bandit , high-dimension , minimax regret , Sparsity , upper confidence bound

JOURNAL ARTICLE
44 PAGES


SHARE
Vol.15 • No. 2 • 2021
Back to Top