279th LG : The Impact of Random Samples in Ensemble Classifiers

Friday, July 15, 2016 - 12:00
Prathyush Chirra
The use of ensemble classifiers, e.g., Bagging and Boosting, is wide spread to machine learning. However, most of studies in this area are based on empirical comparisons that suffer from a lack of care to the randomness of these methods. This paper describes the dangers of experiments with ensemble classifiers by analyzing the efficiency of Bagging and Boosting methods over 32 different data sets. The experiments show that variations due to randomness are often more relevant than the advantages among methods encountered in the literature. This paper main contribution is the claim, supported by statistical analysis, that no empirical comparison of ensemble classifiers can be scientifically done without paying attention to the random choices taken.