Skip to main content

Table 15 Multimodal results based on combined text-audio feature selectors

From: Multimodal machine learning for language and speech markers identification in mental health

Model

Features

Accuracy

AUC-ROC

F1 - 0s

F1 - 1s

SVM

20t, 15a

86.17%

92.80%

0.91

0.79

SVM

20t, 10a

86.71%

92.74%

0.92

0.80

SVM

15t, 15a

82.97%

90.61%

0.89

0.72

RF

20t, 15a

79.80%

84.45%

0.87

0.60

RF

20t, 10a

80.87%

86.39%

0.87

0.63

RF

15t, 15a

79.82%

84.38%

0.87

0.60

LogReg

20t, 15a

85.14%

91.05%

0.89

0.77

LogReg

20t, 10a

86.73%

92.36%

0.90

0.80

LogReg

15t, 15a

84.57%

91.01%

0.89

0.74

Dense Layers

20t, 15a

84.59%

89.55%

0.90

0.76

Dense Layers

20t, 10a

84.04%

90.22%

0.89

0.74

Dense Layers

15t, 15a

84.07%

89.91%

0.88

0.70