African Journal of Applied Statistics

Classification of the Gambian Online Newspapers by keywords : an unsupervized study and Data Streaming Platform

Gane Samb LO, Babanding SANYANG, and Saidou MS BADJIE

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

In this paper, we begin a regional project of knowledge retrieval process from African online newspapers. We first focus on the Gambian context and undertake an unsupervized learning process from such journals. With the help of appropriate and specifically designed computer packages, we studied the keywords that likely discriminate the categories of articles (agriculture, health, politics, etc). We found 681 words that would efficiently help building a very efficient classifier of categories of articles and that would serve building a metric from which regular classification or Big data classification methods are operated. The success of the study is a pretext to set a Data stream platform of regions of the World, Africa for example.

Résumé

Dans cet article, nous commençons un projet régional de processus de recherche de connaissances à partir de journaux en ligne africains. Nous nous concentrons d'abord sur le contexte gambien et entreprenons un processus d'apprentissage non supervisé à partir de ces revues. Avec l'aide de logiciels appropriés et spécialement conçus, nous étudié les mots-clés qui discriminent probablement les catégories d'articles (agriculture, santé, politique, etc.). Nous avons trouvé 681 mots qui pourraient aider efficacement à construire un classificateur très efficace de catégories d'articles et qui servirait à construire une métrique à partir de laquelle des méthodes classification régulière ou de classification de données massives (Big Data) sont utilisées. Le succès de l'étude est un prétexte pour établir une plate-forme de flux de données des régions du monde, Afrique par exemple., pour la détection continues des mots-clés et de leur adaption.

Article information

Source
Afr. J. Appl. Stat., Volume 5, Number 1 (2018), 377-401.

Dates
First available in Project Euclid: 16 May 2019

Permanent link to this document
https://projecteuclid.org/euclid.ajas/1557972185

Digital Object Identifier
doi:10.16929/ajas/377.221

Subjects
Primary: 62-07: Data analysis 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 60-04: Explicit machine computation and programs (not the theory of computation or programming) 60-08: Computational methods (not classified at a more specific level) [See also 65C50]

Keywords
data-mining web-mining keywords and patterns knowledge retrieval unsupervized learning process statistical learning computer packages

Citation

LO, Gane Samb; SANYANG, Babanding; BADJIE, Saidou MS. Classification of the Gambian Online Newspapers by keywords : an unsupervized study and Data Streaming Platform. Afr. J. Appl. Stat. 5 (2018), no. 1, 377--401. doi:10.16929/ajas/377.221. https://projecteuclid.org/euclid.ajas/1557972185


Export citation