2021 Key n-Gram Extractions and Analyses of Different Registers Based on Attention Network
Haiyan Wu, Ying Liu, Shaoyun Shi, Qingfeng Wu, Yunlong Huang
Author Affiliations +
J. Appl. Math. 2021: 1-16 (2021). DOI: 10.1155/2021/5264090

Abstract

Key n -gram extraction can be seen as extracting n-grams which can distinguish different registers. Keyword (as n=1, 1-gram is the keyword) extraction models are generally carried out from two aspects, the feature extraction and the model design. By summarizing the advantages and disadvantages of existing models, we propose a novel key n-gram extraction model “attentive n-gram network” (ANN) based on the attention mechanism and multilayer perceptron, in which the attention mechanism scores each n-gram in a sentence by mining the internal semantic relationship between words, and their importance is given by the scores. Experimental results on the real corpus show that the key n-gram extracted from our model can distinguish a novel, news, and text book very well; the accuracy of our model is significantly higher than the baseline model. Also, we conduct experiments on key n-grams extracted from these registers, which turned out to be well clustered. Furthermore, we make some statistical analyses of the results of key n-gram extraction. We find that the key n-grams extracted by our model are very explanatory in linguistics.

Acknowledgments

This work is supported by the Tsinghua UniversityHumanities and Social Sciences Revitalization Project (2019THZWJC38), the Project of Baidu Netcom Technology Co. Ltd. Open Source Course and Case Construction Based on the Deep Learning Framework PaddlePaddle (20202000291), the Distributed Secure Estimation of Multi-sensor Systems Subject to Stealthy Attacks (62073284), the Multi-sensor-based Estimation Theory and Algorithms with Clustering Hierarchical Structures (61603331), and the National Key Research and Development Program of China (2019YFB1406300).

Citation

Download Citation

Haiyan Wu. Ying Liu. Shaoyun Shi. Qingfeng Wu. Yunlong Huang. "Key n-Gram Extractions and Analyses of Different Registers Based on Attention Network." J. Appl. Math. 2021 1 - 16, 2021. https://doi.org/10.1155/2021/5264090

Information

Received: 10 July 2020; Accepted: 25 April 2021; Published: 2021
First available in Project Euclid: 28 July 2021

Digital Object Identifier: 10.1155/2021/5264090

Rights: Copyright © 2021 Hindawi

Vol.2021 • 2021
Back to Top