Open Access
December 2020 Mining events with declassified diplomatic documents
Yuanjun Gao, Jack Goetz, Matthew Connelly, Rahul Mazumder
Ann. Appl. Stat. 14(4): 1699-1723 (December 2020). DOI: 10.1214/20-AOAS1344

Abstract

Since 1973, the U.S. State Department has been using electronic record systems to preserve classified communications. Recently, approximately 1.9 million of these records from 1973–77 have been made available by the U.S. National Archives. While some of these communication streams have periods witnessing an acceleration in the rate of transmission, others do not show any notable patterns in communication intensity. Given the sheer volume of these communications, far greater than what had been available until now, scholars need automated statistical techniques to identify the communications that warrant closer study. We develop a statistical framework that can identify from a large corpus of documents a handful that historians would consider more interesting. Our approach brings together techniques from nonparametric signal estimation, statistical hypothesis testing and modern optimization methods—leading to a set of tools that help us identify and analyze various geometrical aspects of the communication streams. Dominant periods of heightened activities, as identified through these methods, correspond well with historical events recognized by standard reference works on the 1970s.

Citation

Download Citation

Yuanjun Gao. Jack Goetz. Matthew Connelly. Rahul Mazumder. "Mining events with declassified diplomatic documents." Ann. Appl. Stat. 14 (4) 1699 - 1723, December 2020. https://doi.org/10.1214/20-AOAS1344

Information

Received: 1 May 2019; Revised: 1 February 2020; Published: December 2020
First available in Project Euclid: 19 December 2020

MathSciNet: MR4194244
Digital Object Identifier: 10.1214/20-AOAS1344

Keywords: Fused lasso , National Archives , optimization , signal estimation , U. S. History

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.14 • No. 4 • December 2020
Back to Top