Skip to main content

Table 3 Machine learning model characteristics from selected articles

From: Machine learning applications in studying mental health among immigrants and racial and ethnic minorities: an exploratory scoping review

First Author (year)

Outcome Variable

Predictors (Input variables)

ML technique

Cross-validation method (internal, external)

Type

Program used

Best algorithm performance

Acion (2017) [36]

Substance abuse treatment success

28; 10 patient characteristics, 3 treatment factors, referral type, problematic substance characteristics, and mental health problem

LR, RLR, Lasso-LR, EN, RF, DNN, EL

Two-fold cross-validation (I)

Classification

R; H2O R interface and package rROC

AUC: 0.793–0.820

Best mode: EL

Augsburger (2017) [31]

Risk-taking behavior as measured using a balloon analog risk task (BART)

Exposure to different types of childhood maltreatment, experiences of war and torture, lifetime traumatic events and symptoms of depression and PTSD, sociodemographic factors

Stochastic GBM

Tenfold cross-validation with three repetitions (I)

Regression

R; gbm & caret

RMSE: 18.70, R^2: 0.20,

Baird (2022) [35]

Psychological trauma as measured on the GHQ-12

18 digitally coded features in self-portraits and free drawings

One model method used: LASSO-R

K-fold cross-validation (I)

Regression

Not reported

R-squared: 0.108

Castilla-Puentes (2021) [41]

Tone, topics, and attitude of digital conversations

Digital conversations

NLP and texting mining

Not used

Unsupervised- Topic modeling

CulturIntel

Not reported

Choi (2020) [32]

Psychological distress is measured using the Kessler Psychological Distress Scale (K10)

Demographic characteristics, three types of discrimination characteristics, three types of coping mechanisms

ANN

Not used

Classification

SPSS

AUC: 0.806

Drydakis (2021) [33]

Increased level of integration, overall health, and mental health

Number of mobile applications in use that facilitate immigrants’ societal integration

Linear Regression

Not used

Regression

Not reported

p < 0.005

Erol (2022) [34]

Symptom severity of depression and PTSD

Demographic data, PTSD and depression levels, access to food and education, and changes in family income

Linear regression

Not used

Regression

SPSS

R-squared = 0.123

Goldstein (2022) [37]

Suicidal ideation in the past year

Experience of discrimination, demographics

Deep-learning NLP algorithms and LR

Not used

Classification

Not reported

Not reported

Haroz (2020) [39]

Suicide attempts, measured at 6, 12, and 24 months after an initial suicide-related event

73; demographic characteristics, educational history, past mental health, substance use, living status, history of domestic violence, participation in tribal activities, knowing anyone who died by suicide in their lifetime, and number of indexed events

RF, SVM, Lasso-R, RLR

Repeated cross-validation with 10 iterations (I)

Classification

Not reported

AUC: 0.87

Huber (2020) [38]

Migrant status

653 variables

LR, DTs, SVM, and naive Bayes

5-fold cross-validation (I)

Classification

Not reported

DT

Accuracy: 74.5%; AUC: 0.75

Khatua (2021) [40]

Tweets that fall into 3 themes: generic views, initial struggles, and subsequent settlement

Tweets

Bi-LSTM, CNN, BERT

Training and testing

Classification

Python

F1-Score: 61.61–75.89%

Liu (2021) [43]

MH diagnosis from EHR

Copy number variation

Multi-layer perceptron

Two-fold random shuffle test validation (I)

Classification

Python; Scikit-learn package

Accuracy: 65.7%

Liu (2021) [42]

ADHD diagnosis

Copy number variation

Multi-layer perceptron

Two-fold random shuffle test validation (E)

Classification

Python; Scikit-learn package

Accuracy: 75.4%

  1. Abbreviations: Logistic regression (LR), Ridge logistic regression (RLR), Least Absolute Shrinkage and Selection Operator, (Lasso-LR), random forests (RF), deep learning neural networks (DNNs), Ensemble learning (EL), Lasso-Regression (Lasso-R), gradient boosting machines (GBM), Natural language processing (NLP), Artificial Neural Network (ANN), decision trees (DTs), support vector machines (SVM), Bidirectional Long Short-Term Memory (Bi-LSTM) and Convolutional neural network (CNN), Bidirectional Encoder Representations from Transformers (BERT), area under the receiver operating characteristic Curve (AUC), Root-mean-square error (RMSE)