## Electronic Journal of Statistics

### Query-dependent ranking and its asymptotic properties

#### Abstract

Ranking, also known as learning to rank in machine learning community, is to rank a number of items based on their relevance to a specific query. In literature, most ranking methods use a uniform ranking function to evaluate the relevance, which completely ignores the heterogeneity among queries. To admit different ranking functions for various queries, a general $U$-process formulation for query-dependent ranking is developed. It allows to incorporate neighborhood structure among queries via various forms of smoothing weights to improve the ranking performance. One of its salient features is its capability of producing reasonable rankings for novel queries that are absent in the training set, which is commonly encountered in practice but often neglected in the literature. The proposed method is implemented via an inexact alternating direction method of multipliers (ADMM) for each query parallelly. Its asymptotic risk bound is established, showing that it achieves desirable ranking accuracy at a fast rate for any query including the novel ones. Furthermore, simulated examples and a real application to the Yahoo! challenge dataset also support the advantage of the query-dependent ranking method against existing competitors.

#### Article information

Source
Electron. J. Statist., Volume 13, Number 1 (2019), 465-488.

Dates
First available in Project Euclid: 12 February 2019

https://projecteuclid.org/euclid.ejs/1549962032

Digital Object Identifier
doi:10.1214/19-EJS1531

#### Citation

Dai, Ben; Wang, Junhui. Query-dependent ranking and its asymptotic properties. Electron. J. Statist. 13 (2019), no. 1, 465--488. doi:10.1214/19-EJS1531. https://projecteuclid.org/euclid.ejs/1549962032

#### References

• [1] Shivani Agarwal. Learning to rank on graphs., Machine Learning, 81:333–357, 2010.
• [2] Mayer Alvo and Philip Yu., Statistical methods for ranking data. Springer, 2014.
• [3] Peter L Bartlett, Michael I Jordan, and Jon D McAuliffe. Convexity, classification, and risk bounds., Journal of the American Statistical Association, 101(473):138–156, 2006.
• [4] Luis Antonio Belanche Muñoz and Marco Villegas. Kernel functions for categorical variables with application to problems in the life sciences. In, Artificial intelligence research and development: proceedings of the 16 International Conference of the Catalan Association of Artificial Intelligence, pages 171–180, 2013.
• [5] Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model., Journal of Machine Learning Research, 3 :1137–1155, 2003.
• [6] Jiang Bian, Tie-Yan Liu, Tao Qin, and Hongyuan Zha. Ranking with query-dependent loss for web search. In, Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, pages 141–150. ACM, 2010.
• [7] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers., Foundations and Trends in Machine Learning, 3:1–122, 2011.
• [8] Clément Calauzenes, Nicolas Usunier, and Patrick Gallinari. On the (non-) existence of convex, calibrated surrogate losses for ranking. In, Advances in Neural Information Processing Systems, pages 197–205, 2012.
• [9] Olivier Chapelle and Yi Chang. Yahoo! learning to rank challenge overview. In, Yahoo! Learning to Rank Challenge, pages 1–24, 2011.
• [10] Tianle Chen, Yuanjia Wang, Huaihou Chen, Karen Marder, and Donglin Zeng. Targeted local support vector machine for age-dependent classification., Journal of the American Statistical Association, 109 :1174–1187, 2014.
• [11] Stéphan Clémençon, Gabor Lugosi, and Nicolas Vayatis. Ranking and empirical minimization of U-statistics., Annals of Statistics, 36:844–874, 2008.
• [12] Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. An experimental comparison of click position-bias models. In, Proceedings of the 2008 International Conference on Web Search and Data Mining, pages 87–94. ACM, 2008.
• [13] Jianqing Fan and Irene Gijbels., Local polynomial modelling and its applications: monographs on statistics and applied probability. CRC Press, 1996.
• [14] Xiubo Geng, Tie-Yan Liu, Tao Qin, Andrew Arnold, Hang Li, and Heung-Yeung Shum. Query dependent ranking using k-nearest neighbor. In, Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 115–122. ACM, 2008.
• [15] Palash Goyal and Emilio Ferrara. Graph embedding techniques, applications, and performance: A survey., Knowledge-Based Systems, 151:78–94, 2018.
• [16] Wolfgang Härdle., Smoothing techniques: with implementation in S. Springer Science & Business Media, 2012.
• [17] Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin rank boundaries for ordinal regression. In, Advances in Large Margin Classifiers, pages 115–132. MIT, 2000.
• [18] Kalervo Järvelin and Jaana Kekäläinen. Cumulated gain-based evaluation of IR techniques., ACM Transactions on Information Systems, 20:422–446, 2002.
• [19] Maurice G Kendall. A new measure of rank correlation., Biometrika, 30:81–93, 1938.
• [20] Matthäus Kleindessner and Ulrike von Luxburg. Kernel functions based on triplet comparisons. In, Advances in Neural Information Processing Systems, pages 6807–6817, 2017.
• [21] Ravi Kumar and Sergei Vassilvitskii. Generalized distances between rankings. In, Proceedings of the 19th International Conference on World Wide Web, pages 571–580. ACM, 2010.
• [22] Hang Li. Learning to rank for information retrieval and natural language processing., Synthesis Lectures on Human Language Technologies, 7:1–121, 2014.
• [23] Yuanhua Lv, Taesup Moon, Pranam Kolari, Zhaohui Zheng, Xuanhui Wang, and Yi Chang. Learning to model relatedness for news recommendation. In, Proceedings of the 20th International Conference on World Wide Web, pages 57–66. ACM, 2011.
• [24] Horia Mania, Aaditya Ramdas, Martin J Wainwright, Michael I Jordan, Benjamin Recht, et al. On kernel methods for covariates that are rankings., Electronic Journal of Statistics, 12(2) :2537–2577, 2018.
• [25] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space., arXiv preprint arXiv :1301.3781, 2013.
• [26] Sayan Mukherjee and Ding-Xuan Zhou. Learning coordinate covariances via gradients., Journal of Machine Learning Research, 7:519–549, 2006.
• [27] Deborah Nolan and David Pollard. U-processes: rates of convergence., Annals of Statistics, 15(2):780–799, 1987.
• [28] Wojciech Rejchel. On ranking and generalization bounds., Journal of Machine Learning Research, 13 :1373–1392, 2012.
• [29] Amnon Shashua and Anat Levin. Ranking with large margin principle: Two approaches. In, Advances in Neural Information Processing Systems, pages 937–944, 2002.
• [30] Xiaotong Shen, George C Tseng, Xuegong Zhang, and Wing Hung Wong. On $\psi$-learning., Journal of the American Statistical Association, 98:724–734, 2003.
• [31] Yoshikazu Terada and Ulrike Luxburg. Local ordinal embedding. In, International Conference on Machine Learning, pages 847–855, 2014.
• [32] Kazuki Uematsu and Yoonkyung Lee. On theoretically optimal ranking functions in bipartite ranking., Journal of the American Statistical Association, 112(519) :1311–1322, 2017.
• [33] Aad W Van Der Vaart and Jon A Wellner., Weak Convergence and Empirical Processes. Springer, 1996.
• [34] Grace Wahba., Spline models for observational data. Siam press, 1990.
• [35] Grace Wahba et al. Support vector machines, reproducing kernel hilbert spaces and the randomized gacv., Advances in Kernel Methods-Support Vector Learning, 6:69–87, 1999.
• [36] Huahua Wang and Arindam Banerjee. Bregman alternating direction method of multipliers. In, Advances in Neural Information Processing Systems, pages 2816–2824, 2014.
• [37] Yining Wang, Liwei Wang, Yuanzhi Li, Di He, and Tie-Yan Liu. A theoretical analysis of NDCG type ranking measures. In, Conference on Learning Theory, pages 25–54, 2013.
• [38] Larry Wasserman., All of Nonparametric Statistics. Springer, 2006.
• [39] Yichao Wu and Yufeng Liu. Robust truncated hinge loss support vector machines., Journal of the American Statistical Association, 102(479):974–983, 2007.
• [40] Yichao Wu and Yufeng Liu. Adaptively weighted large margin classifiers., Journal of Computational and Graphical Statistics, 22(2):416–432, 2013.
• [41] Lan Xue. Consistent variable selection in additive models., Statistica Sinica, 19 :1281–1296, 2009.
• [42] Yisong Yue, Thomas Finley, Filip Radlinski, and Thorsten Joachims. A support vector method for optimizing average precision. In, Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 271–278. ACM, 2007.
• [43] Ding-Xuan Zhou. The covering number in learning theory., Journal of Complexity, 18:739–767, 2002.
• [44] Ji Zhu and Trevor Hastie. Kernel logistic regression and the import vector machine., Journal of Computational and Graphical Statistics, 14:185–205, 2012.