Skip to main content

Table 16 Results comparison between Unimodal and Multimodal approaches, per ML model

From: Multimodal machine learning for language and speech markers identification in mental health

Modality - Model

Features #

Accuracy

AUC-ROC

F1 - 0s

F1 - 1s

Text - SVM

20

86.77%

93.33%

0.91

0.78

Audio - SVM

20

68.61%

61.58%

0.81

0.12

Multimodal - SVM

20t, 10a

86.71%

92.74%

0.92

0.80

Text - RF

25

85.72%

91.75%

0.90

0.73

Audio - RF

15

71.83%

75.20%

0.82

0.40

Multimodal - RF

20t, 10a

80.87%

86.39%

0.87

0.63

Text - LogReg

20

87.82%

92.44%

0.91

0.79

Audio - LogReg

10

68.62%

59.12%

0.80

0.21

Multimodal - LogReg

20t, 10a

86.73%

89.55%

0.90

0.80

Text - FCNN

25

84.11%

91.79%

0.89

0.74

Audio - FCNN

10

69.70%

68.23%

0.80

0.46

Multimodal - FCNN

20t, 15a

84.59%

89.55%

0.90

0.76

  1. FCNN stands for Fully Connected Neural Network (Dense Layers in this case)