The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 12, Number 2 (2018), 1096-1123.
Discovering political topics in Facebook discussion threads with graph contextualization
We propose a graph contextualization method, pairGraphText, to study political engagement on Facebook during the 2012 French presidential election. It is a spectral algorithm that contextualizes graph data with text data for online discussion thread. In particular, we examine the Facebook posts of the eight leading candidates and the comments beneath these posts. We find evidence of both (i) candidate-centered structure, where citizens primarily comment on the wall of one candidate and (ii) issue-centered structure (i.e., on political topics), where citizens’ attention and expression is primarily directed toward a specific set of issues (e.g., economics, immigration, etc). To identify issue-centered structure, we develop pairGraphText, to analyze a network with high-dimensional features on the interactions (i.e., text). This technique scales to hundreds of thousands of nodes and thousands of unique words. In the Facebook data, spectral clustering without the contextualizing text information finds a mixture of (i) candidate and (ii) issue clusters. The contextualized information with text data helps to separate these two structures. We conclude by showing that the novel methodology is consistent under a statistical model.
Ann. Appl. Stat., Volume 12, Number 2 (2018), 1096-1123.
Received: August 2017
Revised: March 2018
First available in Project Euclid: 28 July 2018
Permanent link to this document
Digital Object Identifier
Zhang, Yilin; Poux-Berthe, Marie; Wells, Chris; Koc-Michalska, Karolina; Rohe, Karl. Discovering political topics in Facebook discussion threads with graph contextualization. Ann. Appl. Stat. 12 (2018), no. 2, 1096--1123. doi:10.1214/18-AOAS1191. https://projecteuclid.org/euclid.aoas/1532743487
- Supplementary Materials for “Discovering political topics in Facebook discussion threads with graph contextualization”. This supplementary consists of three parts. Part 1 provides more evidence for the candidate-centered structure. Part 2 explains our choice of the number of clusters $K$ when searching for the issue-centered structure. Part 3 discusses different choices for document-term matrices. Part 4 provides more simulations comparing pairGraphText with RTM and other methods including CASC and spectral clustering. Part 5 provides theoretical justifications for pairGraphText.