SOLUTION OF UNBALANCED DATA CLASSIFICATION WITH A BASED APPROACH COMBINATION OF OVERSAMPLING AND UNDERSAMPLING
Abstract
This study applies the Combination of Oversampling and Undersampling method to deal with class imbalances. Researchers do Preprocessing to normalize the attributes used for prediction, then divide the training data and testing data. Researchers resampled unbalanced data using Oversampling, Undersampling and a combination of Oversampling and Undersampling. The results of the classification with the experimental data class balancing approach, the best classification performance is the combination of Oversampling and Undersampling classified by the k-Nearest Neighbor (KNN) method with an accuracy of 0.8672; sensitivity of 0.9000; specificity of 0.3750; and AUC of 0.6651042. Classification with Oversampling has performance results, namely accuracy of 0.875; sensitivity of 0.9250; specificity of 0.1250; and AUC of 0.6078125, while Undersampling classification has classification performance, namely accuracy of 0.3438; sensitivity of 0.33333; specificity of 0.50000; and AUC of 0.3645833.