Skip to main content

Table 1 Comparison of existing machine learning methods for stroke prediction

From: Improving stroke risk prediction by integrating XGBoost, optimized principal component analysis, and explainable artificial intelligence

Method

Description

Advantages

Disadvantages

Logistic Regression

Models the probability of stroke based on risk factors. Uses a sigmoid function and gradient descent to build the model. Evaluates model performance with and without regulation

High accuracy (more than 95%), the possibility of improvement through regularization, and ease of implementation

Limited accuracy with nonlinear relationships, and sensitivity to parameter selection

Decision Tree

A set of decision trees trained on different subsets of data uses voting to make the final decision

Visualization of decisions, ease of interpretation

Tendency to overfitting without regularization

Random Forest

Aggregates result from multiple decision trees, reducing the risk of overfitting

High accuracy (96%), and reliability

Can be slow on large datasets

Naive Bayes

Classifies based on the assumption of independence of features

Simplicity, efficiency in many classification tasks

The achieved accuracy is 82%. Limitations with complex relationships between features

k-Nearest Neighbors (k-NN)

Classifies new observations based on the nearest neighbors in the training set

Simplicity and clarity

Scaling problems, sensitivity to the choice of the k parameter

Support Vector Machine (SVM)

It uses kernel functions to process nonlinear distributions

High efficiency in high-dimensional data

Sensitivity to parameter selection, difficulties with large datasets

Deep Learning

Use of convolutional neural networks (CNN) for medical image analysis

High accuracy in detecting complex patterns

Requires large datasets and powerful resources

Artificial Neural Networks (ANN)

A machine learning model using resampling, data leakage avoidance, feature selection, and interpretability techniques (such as permutation importance and LIME) for stroke prediction

High interpretability with LIME, effective resampling, and feature selection. High prediction accuracy (95%)

Dependence on external dataset validation and ongoing optimization for better performance

XGBoost

A gradient enhancement algorithm that combines different methods to achieve better results

High prognostic efficiency and interpretability. The accuracy is over 97%

Difficult to configure parameters, requires computing resources