Abstract
Sentiment analysis measures inclination of textual documents, aiming to extract and quantify their subjective sentiment polarity. In literature, most sentiment analysis methods first numericalize textual documents through certain word embeddings framework, and then formulate sentiment analysis as an ordinal regression or classification task. Yet it is often ignored that different people may have different preference of wording, and thus a uniform word embeddings often leads to suboptimal performance. In this article, to accommodate the heterogeneity among individual persons, we propose a covariate-assisted word embeddings in a margin-based ordinal regression framework, where covariates are incorporated through scaling factors to adjust the word embeddings. Moreover, we employ a block-wise coordinate descent scheme to tackle the resultant large-scale optimization task, and establish theoretical results to quantify the asymptotic behavior of the proposed method, guaranteeing its fast convergence rate in terms of prediction accuracy. Finally, we demonstrate the advantages of the proposed method over its competitors in both the Yelp Challenge dataset and synthetic datasets.
Funding Statement
This work is supported in part by HK RGC grants GRF-11303918, GRF-11300919 and GRF-11304520.
Acknowledgments
The authors are also grateful to the editor, associate editor, and anonymous reviewers for their constructive comments and suggestions, which have significantly improved the manuscript.
Citation
Shirong Xu. Ben Dai. Junhui Wang. "Sentiment analysis with covariate-assisted word embeddings." Electron. J. Statist. 15 (1) 3015 - 3039, 2021. https://doi.org/10.1214/21-EJS1854
Information