Skip to main content

Table 2 This table presents detailed results from seven machine learning models and three deep learning models. The first section outlines the performance of the machine learning models, reporting mean sensitivity and accuracy using two word embedding techniques: Word2Vec and TF-IDF. Confidence intervals are provided alongside the sensitivity and accuracy scores, with results shown separately for both unprocessed and preprocessed data. The second section details the performance of the deep learning models, also reporting sensitivity, accuracy, and their corresponding confidence intervals for both preprocessed and unprocessed data

From: TECRR: a benchmark dataset of radiological reports for BI-RADS classification with machine learning, deep learning, and large language model baselines

Machine learning methods

Word embs.

Word2Vec

TF-IDF

Model

U-Data (mSen.)

P-Data (mSen.)

U-Data (mAcc.)

P-Data (mAcc.)

U-Data (mSen.)

P-Data (mSen.)

U-Data (mAcc.)

P-Data (mAcc.)

KNN

0.33

0.30

0.78

0.78

0.34

0.36

0.81

0.80

 

(0.238-0.422)

(0.210-0.389)

(0.755-0.806)

(0.757-0.808)

(0.247-0.433)

(0.266-0.454)

(0.792-0.840)

(0.782-0.830)

SVM

0.20

0.20

0.77

0.77

0.44

0.42

0.85

0.85

 

(0.122-0.278)

(0.122-0.278)

(0.750-0.801)

(0.750-0.801)

(0.343-0.537)

(0.323-0.517)

(0.837-0.880)

(0.835-0.878)

NB

0.20

0.24

0.76

0.70

0.20

0.22

0.78

0.79

 

(0.122-0.278)

(0.156-0.324)

(0.738-0.791)

(0.673-0.729)

(0.247-0.433)

(0.266-0.454)

(0.787-0.835)

(0.789-0.837)

RF

0.20

0.20

0.77

0.77

0.34

0.36

0.81

0.81

 

(0.122-0.278)

(0.122-0.278)

(0.750-0.801)

(0.750-0.801)

(0.247-0.433)

(0.266-0.454)

(0.787-0.835)

(0.789-0.837)

AdaBoost

0.33

0.35

0.65

0.61

0.33

0.33

0.41

0.74

 

(0.238-0.422)

(0.256-0.443)

(0.627-0.686)

(0.589-0.649)

(0.238-0.422)

( 0.238-0.422)

(0.389-0.450)

(0.715-0.769)

GB

0.30

0.31

0.79

0.79

0.43

0.45

0.84

0.85

 

(0.210-0.389)

(0.219-0.400)

(0.773-0.823)

(0.774-0.824)

(0.333-0.527)

(0.352-0.547)

(0.818-0.863)

(0.829-0.872)

XGB

0.33

0.33

0.80

0.80

0.49

0.52

0.85

0.86

 

(0.238-0.422)

(0.238-0.422)

(0.778-0.828)

(0.784-0.832)

(0.392-0.588)

(0.422-0.618)

(0.838-0.881)

(0.840-0.883)

Deep learning methods

Model

 

U-Data(mSen)

 

P-Data(mSen)

 

U-Data(Accuracy)

 

P-Data(Accuracy)

LSTM

 

0.42

 

0.53

 

0.70

 

0.78

  

(0.346-0.490)

 

(0.455-0.619)

 

(0.673-0.730)

 

(0.753-0.805)

BERT

 

0.40

 

0.54

 

0.72

 

0.79

  

(0.331-0.459)

 

(0.477-0.607)

 

(0.692-0.746)

 

(0.768-0.819)

BioGPT

 

0.45

 

0.60

 

0.74

 

0.80

  

(0.235-0.669)

 

(0.391-0.811)

 

(0.710-0.764)

 

(0.772-0.822)