Open Access
2021 Sentiment analysis with covariate-assisted word embeddings
Shirong Xu, Ben Dai, Junhui Wang
Author Affiliations +
Electron. J. Statist. 15(1): 3015-3039 (2021). DOI: 10.1214/21-EJS1854

Abstract

Sentiment analysis measures inclination of textual documents, aiming to extract and quantify their subjective sentiment polarity. In literature, most sentiment analysis methods first numericalize textual documents through certain word embeddings framework, and then formulate sentiment analysis as an ordinal regression or classification task. Yet it is often ignored that different people may have different preference of wording, and thus a uniform word embeddings often leads to suboptimal performance. In this article, to accommodate the heterogeneity among individual persons, we propose a covariate-assisted word embeddings in a margin-based ordinal regression framework, where covariates are incorporated through scaling factors to adjust the word embeddings. Moreover, we employ a block-wise coordinate descent scheme to tackle the resultant large-scale optimization task, and establish theoretical results to quantify the asymptotic behavior of the proposed method, guaranteeing its fast convergence rate in terms of prediction accuracy. Finally, we demonstrate the advantages of the proposed method over its competitors in both the Yelp Challenge dataset and synthetic datasets.

Funding Statement

This work is supported in part by HK RGC grants GRF-11303918, GRF-11300919 and GRF-11304520.

Acknowledgments

The authors are also grateful to the editor, associate editor, and anonymous reviewers for their constructive comments and suggestions, which have significantly improved the manuscript.

Citation

Download Citation

Shirong Xu. Ben Dai. Junhui Wang. "Sentiment analysis with covariate-assisted word embeddings." Electron. J. Statist. 15 (1) 3015 - 3039, 2021. https://doi.org/10.1214/21-EJS1854

Information

Received: 1 October 2020; Published: 2021
First available in Project Euclid: 4 June 2021

Digital Object Identifier: 10.1214/21-EJS1854

Subjects:
Primary: 62H30

Keywords: ordinal regression , Personalized prediction , sentiment analysis , unstructured data , word embeddings

Vol.15 • No. 1 • 2021
Back to Top