SOLUTION OF UNBALANCED DATA CLASSIFICATION WITH A BASED APPROACH COMBINATION OF OVERSAMPLING AND UNDERSAMPLING

  • Riza Susanto Banner Student
  • Irwan Budiman
  • Dodon Turianto Nugrahadi
  • M. Reza Faisal
  • Friska Abadi
Keywords: Keywords : Oversampling, Undersampling, k-Nearest Neighbor (KNN).

Abstract

This study applies the Combination of Oversampling and Undersampling method to deal with class imbalances. Researchers do Preprocessing to normalize the attributes used for prediction, then divide the training data and testing data. Researchers resampled unbalanced data using Oversampling, Undersampling and a combination of Oversampling and Undersampling. The results of the classification with the experimental data class balancing approach, the best classification performance is the combination of Oversampling and Undersampling classified by the k-Nearest Neighbor (KNN) method with an accuracy of 0.8672; sensitivity of 0.9000; specificity of 0.3750; and AUC of 0.6651042. Classification with Oversampling has performance results, namely accuracy of 0.875; sensitivity of 0.9250; specificity of 0.1250; and AUC of 0.6078125, while Undersampling classification has classification performance, namely accuracy of 0.3438; sensitivity of 0.33333; specificity of 0.50000; and AUC of 0.3645833.

Published
2022-10-03
How to Cite
Banner, R. S., Irwan Budiman, Dodon Turianto Nugrahadi, M. Reza Faisal, & Friska Abadi. (2022). SOLUTION OF UNBALANCED DATA CLASSIFICATION WITH A BASED APPROACH COMBINATION OF OVERSAMPLING AND UNDERSAMPLING. Journal of Data Science and Software Engineering, 3(01), 1-10. Retrieved from https://jurnalmahasiswamipa.ulm.ac.id/index.php/integer/article/view/63
Section
Articles