Skip to main content

Table 2 Performance of ML algorithms on original imbalanced data vs. data balanced with class balancing and SMOTE

From: Development of a machine learning prediction model for loss to follow-up in HIV care using routine electronic medical records in a low-resource setting

ML algorithm

Comparison metrics

Original imbalanced classes (%)

Class balancer (%)

SMOTE (%)

RF

Accuracy

84.8

84.1

84.2

Sensitivity

68.3

82.3

82.4

AUC

89.1

89.0

89.5

J48

Accuracy

85.2

83.7

83.9

Sensitivity

66.8

82.2

82.5

AUC

87.2

87.8

88.0

K-NN

Accuracy

84.9

84.0

84.2

Sensitivity

66.8

82.3

82.4

AUC

89.1

89.1

89.5

SVM

Accuracy

85.0

81.8

81.9

Sensitivity

66.3

80.1

80.0

AUC

79.8

81.8

81.8

LR

Accuracy

84.6

81.1

81.7

Sensitivity

68.2

80.3

78.1

AUC

88.6

88.5

88.5

Naïve Bayes

Accuracy

83.9

81.2

81.8

Sensitivity

70.9

79.3

75.9

AUC

88.3

88.3

88.3

  1. Abbreviations: AUC - area under the curve, J48 is a decision tree algorithm based on the C4.5 algorithm (J48 is its implementation in Weka). K-NN = k-nearest neighbors, LR = logistic regression, ML = machine learning, RF = random forest, SMOTE = synthetic minority oversampling technique, SVM = support vector machine