Empirical Performance of CART, C5.0 and Random Forest Classification Algorithms for Decision Trees

Bissilimou Rachidatou Orounla; Akoeugnigan Idelphonse Sode; Kolawolé Valère Salako; Romain Glèlè Kakaï

doi:10.16929/ajas/2023.1399.274

Abstract

This study compares the performance of CART, C5.0 and Random Forest (RF) algorithms. 25 continuous predictors and 25 factors were simulated using a population size of 10,000. Based on this data, sample data were generated by varying the number of predictors, the proportion of categorical versus continuous predictors and the sample size. The performance of the tree algorithms increases with sample size and the number of variables, but for RF, it is highly greater than the one of CART and C5.0. Irrespective of the algorithms, the performance decreases when there are more categorical variables than continuous variables.

La présente étude compare les performances des algorithmes CART, C5.0 et Random Forest (RF). 25 prédicteurs continus et 25 facteurs ont été simulés à partir d'une population de taille 10000. Sur la base de ces données, des échantillons ont été générés en faisant varier le nombre de prédicteurs, la proportion de prédicteurs catégoriels par rapport aux prédicteurs continus et la taille de l'échantillon. La performance des algorithmes augmente avec la taille de l'échantillon et le nombre de variables. Celle de textitRF est nettement supérieure à celle de CART et de C5.0. Indépendamment des algorithmes, la performance diminue lorsqu'il y a plus de variables catégorielles que de variables continues.

Citation

Download Citation

Bissilimou Rachidatou Orounla. Akoeugnigan Idelphonse Sode. Kolawolé Valère Salako. Romain Glèlè Kakaï. "Empirical Performance of CART, C5.0 and Random Forest Classification Algorithms for Decision Trees." Afr. J. Appl. Stat. 10 (1) 1399 - 1418, January 2023. https://doi.org/10.16929/ajas/2023.1399.274

Information

Published: January 2023

First available in Project Euclid: 10 November 2023

Digital Object Identifier: 10.16929/ajas/2023.1399.274

Subjects:

Primary: 97R40

Secondary: 65Y20 , 68Q25

Keywords: Accuracy , categorical variables , non-parametric modeling , simulation , specificity

Abstract

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS