- Research
- Open access
- Published:
Machine learning-based prediction of post-induction hypotension: identifying risk factors and enhancing anesthesia management
BMC Medical Informatics and Decision Making volume 25, Article number: 96 (2025)
Abstract
Background
Post-induction hypotension (PIH) increases surgical complications including myocardial injury, acute kidney injury, delirium, stroke, prolonged hospitalization, and endangerment of the patient's life. Machine learning is an effective tool to analyze large amounts of data and identify perioperative complication factors. This study aims to identify risk factors for PIH and develop predictive models to support anesthesia management.
Methods
A dataset of 5406 patients was analyzed using machine learning methods. Logistic regression, random forest, XGBoost, and neural network models were compared. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC), calibration curves, and decision curve analysis (DCA).
Results
The logistic regression model achieved an AUROC of 0.74 (95% CI: 0.71–0.77), outperforming the random forest (AUROC: 0.71), XGBoost (AUROC: 0.72), and neural network (AUROC: 0.72) models. In terms of calibration, logistic regression demonstrated superior performance, as reflected by Brier Scores and calibration curves, followed by XGBoost, random forest, and neural network. Decision curve analysis indicated that the logistic regression model provided the greatest clinical utility among all models. Baseline blood pressure, age, sex, type of surgery, platelet count, and certain anesthesia-inducing drugs were identified as important features.
Conclusions
This study provides a valuable tool for personalized preoperative risk assessment and customized anesthesia management, allowing for early intervention and improved patient outcomes. Integration of machine learning models into electronic medical record systems can facilitate real-time risk assessment and prediction.
Introduction
Post-induction hypotension (PIH) is a common yet perilous adverse effect, posing an increased risk of surgical complications, including myocardial injury, acute kidney injury, delirium, stroke, prolonged hospital stay, and jeopardizing the patient's life [1,2,3,4]. After induction of anesthesia, anesthesiologists are occupied with tasks such as tracheal intubation, adjusting anesthetic drug dosage, fine-tuning ventilator settings, and documenting medical records, which could potentially lead to the oversight of PIH. Therefore, it would be beneficial to accurately predict the risk of PIH and its associated risk factors in advance.
Machine learning, being potent predictive tool, has exhibited a broad spectrum of applications in the medical domain. Through comprehensive analysis of extensive preoperative and intraoperative data, machine learning models have the capability to identify significant factors linked to the occurrence of perioperative complications. The seamless integration of machine learning models into electronic medical record systems holds the potential to facilitate real-time risk assessment and prediction, thereby enhancing patient care and outcomes.
However, current models for PIH prediction often rely on complex machine learning algorithms and rigorous data collection methods, making them susceptible to overfitting issues, especially when the dataset has limited patient samples. For instance, incorporating invasive arterial pressure data into the analysis may improve prediction accuracy, but such data are only available for high-risk patients in specific procedures and cannot be generalized to the broader patient population. Non-invasive methods like the pleth variability index (PVI), derived from pulse oximetry, provide insights into fluid responsiveness but require specific pulse oximeter devices [5]. Similarly, heart rate variability (HRV), obtained via electrocardiograms (ECG), adds analytical complexity, making it less suitable for widespread use [6]. Ultrasonography, used to assess subclavian or axillary veins, offers another non-invasive approach but depends on the availability of ultrasound equipment [7]. In contrast, this study leverages routinely collected data from electronic medical records (EMRs) to predict PIH, eliminating the need for specialized diagnostic tools. We utilized the VitalDB open dataset, which encompasses routine clinical data, and applied multiple machine learning methods for modeling. We evaluated the models based on discrimination, calibration, and clinical applicability, aiming to enhance their practical implementation and interpretability in clinical settings.
This study aims to identify risk factors for PIH and enhance patient outcomes through the utilization of machine learning models. In cases where patients are at a higher risk, anesthesiologists can modify the anesthetic protocol, implement proactive fluid management strategies, and adjust medication dosages to mitigate the occurrence of PIH. By leveraging these models, anesthesia teams are provided with a practical tool for personalized preoperative risk assessment and tailored anesthesia management.
Methods
Data source
In this study, we obtained the data from VitalDB, a publicly available repository that gathered biosignal and clinical information from 6388 surgical patients during their surgeries [8]. The data covers the period from January 2005 to January 2014 and includes patients undergoing non-cardiac surgeries, such as general, thoracic, urologic, and gynecologic procedures. The dataset contains biological information, including blood pressure, heart rate, and ventilator parameters, as well as clinical information, such as patient age, gender, BMI, type of surgery, and preoperative laboratory test results. This study has been reported in line with the STROCCS criteria [9].
Data pre-processing
To process the preoperative and intraoperative data, we applied systematic feature engineering techniques. Ordered categorical variables were transformed using label encoding to maintain their inherent order. Multicategory variables were converted to binary representations via one-hot encoding. Continuous variables were standardized using z-scores to ensure uniform scaling across features. Variance filtering was conducted to exclude features with low variability (variance < 0.01), as these contribute minimally to model prediction. Correlation coefficient analysis was performed to remove highly correlated features (|r|> 0.8), thereby reducing multicollinearity and enhancing model interpretability.
PIH was defined as a mean arterial pressure (MAP) less than 55 mm Hg between the induction of anesthesia and the start of surgery. This threshold was based on previous studies that have identified a correlation between MAP less than 55 mm Hg and postoperative adverse events [4, 10, 11]. Baseline blood pressure was determined using the first noninvasive blood pressure measurement recorded upon operating room admission, prior to anesthesia induction. This ensures accurate baseline values without influence from anesthetic agents, reducing the risk of data leakage. To enhance data reliability, measurements outside the physiological range (MAP < 20 mmHg or > 160 mmHg) were excluded.
Missing value processing
In our study, we employed median imputation to handle missing values in the dataset. This approach preserves the information content of the features without significantly reducing the sample variance, unlike mean imputation. Median imputation is not influenced by the dominant group within the features and better maintains the expression of the features, particularly when the number of missing values is relatively small. We conducted a sensitivity analysis to compare the performance of the model using median imputation with that of the model where missing values were directly removed.
Machine learning models
In our study, we employed several machine learning models for prediction, including logistic regression, random forest, XGBoost, and neural network models. These models are widely used in classification problems, each with its own unique advantages and disadvantages. To ensure the accuracy and stability of the models, we divided the dataset into training and validation sets in a 7:3 ratio and utilized a five-fold cross-validation method. During the model training process, we performed grid search to fine-tune the model parameters, aiming to enhance the prediction accuracy and generalization capability of the models.
Model performance and evaluation
To evaluate the performance of the models, we utilized several evaluation metrics. The discrimination of the models was assessed using the area under the subject operating characteristic curve (AUROC). AUROC was chosen as the primary metric for its ability to evaluate model discrimination across all thresholds without being affected by class imbalance, a key advantage in clinical datasets. Other metrics, including accuracy, precision, recall, and F1-score, were calculated to provide a more nuanced performance assessment. The thresholds for all models were selected using the maximum Youden’s index to ensure a consistent approach for optimizing sensitivity and specificity. The 95% confidence intervals for performance metrics were calculated using the bootstrap method.
Calibration performance was evaluated to assess the agreement between predicted probabilities and observed outcomes. A calibration curve was plotted by comparing predicted probabilities to observed event rates in deciles of predicted risk. A well-calibrated model should align closely with the diagonal reference line. Models were assessed using fivefold cross-validation, and comparisons were made based on visual inspection of calibration curves and Brier Scores.
To determine the clinical utility of the models, decision curve analysis (DCA) was performed. Net benefit is calculated as follows:
where TP and FP represent true positives and false positives, N is the total sample size, and p is the threshold probability.
DCA evaluates the net benefit across different threshold probabilities, reflecting the trade-offs between true positives and false positives in a clinical context. This metric helps in understanding the practical implications of implementing the predictive models in real-world clinical decision-making.
By considering these evaluation metrics, we aimed to assess the predictive ability, stability, and applicability of the models. Ultimately, these assessments allowed us to identify the best-performing machine learning model, which was selected as the final predictive model.
Results
Dataset characteristics
A total of 5,406 patients were included in the study, of which 921 patients, accounting for 17% of the total, developed post-induction hypotension (Fig. 1).
We extracted 36 features from all clinical variables. Among these features, 13 had missing data. The percentage of missing data for each variable was less than 10%. Complete data without any missing values for all features were available for 88.6% of patients in the dataset.
The descriptive statistics of the clinical characteristics of patients with and without PIH in the dataset are presented in Table 1. Significant differences were found in most clinical characteristics between patients who developed PIH and those who did not. Patients who experienced PIH tended to be of advanced age, female, have a low BMI, and exhibit low preoperative hemoglobin and albumin levels.
Model performance
The AUROC was 0.74 (95% CI, 0.71—0.77) for the logistic regression model, 0.71 (95% CI, 0.68—0.74) for the random forest model, 0.72 (95% CI, 0.69—0.75) for the XGBoost model, and 0.72 (95% CI, 0.68—0.75) for the neural network model (Fig. 2). Differences between the four models were not statistically significant. Secondary metrics such as accuracy, precision, recall, and F1-score varied across models, with logistic regression maintaining a balanced trade-off (Table 2). Further sensitivity analysis of the missing value treatment method did not reveal any significant differences in performance improvement among the four models.
The calibration curves demonstrated that both the logistic regression and random forest models exhibited good calibration performance, indicating that the predicted probabilities aligned well with the observed probabilities. Brier Scores were as follows: logistic regression (0.1291), XGBoost (0.1305), random forest (0.1312), and neural network (0.1328). Based on Brier Scores and calibration curves, logistic regression displayed the best calibration, followed by XGBoost, random forest, and neural network (Fig. 3).
Regarding the clinical benefit, the DCA curves indicated that the logistic regression model provided the highest clinical benefit compared to the other models (Fig. 4). This suggests that in the clinical prediction of PIH, logistic regression may offer a more favorable trade-off between reducing unnecessary interventions and avoiding missed diagnoses.
Considering the overall performance and interpretability, we selected the logistic regression model as the final prediction model for model interpretation.
Model interpretation
The importance of features in the logistic regression model is determined by the absolute value of the feature coefficient which is intuitively interpretable. A larger coefficient value indicates a greater contribution of the feature to the model's prediction. By examining the magnitude of the coefficients, we identified the 10 most important features in the model (Fig. 5). These significant features, in descending order of importance, were baseline diastolic blood pressure, age, sex, type of surgery, baseline systolic blood pressure, platelet count, rocuronium bromide use, urea nitrogen, albumin, and fentanyl use.
Discussion
This study developed a machine learning model to predict post-induction hypotension (PIH) using data from 5406 patients. Among the models tested, the logistic regression model demonstrated the best performance, achieving an AUROC of 0.74 (95% CI, 0.71 to 0.77). In previous studies, traditional logistic regression algorithms did not perform better than other machine learning models [12,13,14,15]. Models like XGBoost and neural network often outperform logistic regression in capturing non-linear relationships and interactions among variables [12]. However, the increased computational demands and the need for careful hyperparameter tuning may hinder its practical implementation in real-time clinical settings. Similarly, while random forest offers high predictive performance with minimal parameter tuning, it has been observed to be prone to overfitting prone to overfitting in smaller datasets or sparse data [15]. In this study, while logistic regression demonstrated comparable AUROC values to other models, its strengths in calibration, clinical utility and clinical applicability highlight its comprehensive performance. This result may be attributed to the class imbalance in the dataset, where more complex models are prone to overfitting the majority class, reducing their sensitivity to positive samples. Logistic regression, with its simpler structure, is considered the most sensitive classifier for imbalanced defect datasets [16], leading to more robust performance across metrics.
In decision curve analysis, the choice of threshold reflects the clinical preference for minimizing false positives versus false negatives. Given the nature of PIH, where overtreatment could pose unnecessary risks to patients and missed cases might delay timely interventions, selecting thresholds that favor higher precision may align better with clinical priorities. Future studies could explore different threshold ranges to evaluate whether alternative trade-offs might yield improved predictive utility under varying clinical contexts.
The results of the model interpretation provide insights into the significant features contributing to the prediction of PIH. One of the most important features is the baseline blood pressure, which aligns with previous research findings [13, 15, 17,18,19,20]. Higher baseline blood pressure tends to indicate a lower risk of hypotension during surgery, while lower baseline blood pressure may increase the risk of developing hypotension. However, it has also been suggested that high baseline blood pressure is a risk factor for PIH [21, 22]. The definition of outcome in these two studies was based on the percentage decrease in blood pressure relative to baseline, which may account for the diametrically opposed findings. Currently, there is no accepted definition of PIH, with studies employing thresholds of MAP < 55 mmHg, MAP < 60 mmHg, or MAP < 65 mmHg [23,24,25]. In this study, we chose MAP < 55 mmHg as a stricter threshold to focus on cases with more severe hemodynamic changes, which are more likely to have significant clinical implications. While this approach highlights severe PIH cases, we acknowledge that different thresholds might influence the reported incidence and clinical interpretation of PIH. Additionally, using baseline MBP alone to predict PIH showed an AUROC of 0.65, with poorer performance on calibration and DCA curves compared to comprehensive model (Supplementary materials).
Basic patient characteristics such as age and gender are found to potentially influence the occurrence of PIH. It is generally accepted that older patients are prone to PIH [6, 15, 17,18,19, 22], but there is controversy about the effect of gender [6, 12, 26]. Furthermore, laboratory indicators including platelet count, urea nitrogen, and albumin are identified as significant features. These indicators can reflect the patient's hematologic status and renal function. While direct evidence linking these indicators and hypotension during anesthesia is limited, abnormal values in these indicators may indicate disturbances in metabolic status and fluid balance, which could contribute to the occurrence of PIH. The use of anesthesia-inducing drugs affects patients' blood pressure, and incorporating this information into the model may help predict the risk of PIH. Although the use of rocuronium has no additional clinically relevant effects on cardiovascular dynamics, a transient decrease in blood pressure may be observed during the infusion [27, 28]. In contrast, fentanyl was negatively associated, likely because of its hemodynamic stability compared to other agents [29].
There are currently reported methods for predicting PIH using specialized equipment. For example, the pleth variability index, which is used to automatically estimate respiratory variability, has a sensitivity of 0.79 and a specificity of 0.71 for predicting PIH [5]. Heart rate variability analysis has an AUROC of 0.70 [7]. In addition, a model trained on vital signs recorded 4 to 1 min prior to intubation achieves an accuracy of up to 0.72 [20]. The model constructed in this study uses only data routinely collected from electronic medical records, eliminating the need for specialized diagnostic equipment or professional technical personnel. Although this approach may compromise accuracy, it strikes a balance between practicality and precision, providing a widely applicable and scalable tool for clinical risk stratification.
There are some limitations to the study. The use of a dataset from a single institutional database may introduce biases and limit the generalizability of the findings to other populations or healthcare settings. Second, the reliance on noninvasive blood pressure measurements rather than invasive measurements could introduce measurement errors and potential inaccuracies in capturing blood pressure dynamics. Another limitation is the lack of detailed information on vasoactive drug use. The dataset's limited information on the administration and dosing of vasoactive drugs may have affected the classification of outcome events and the predictive accuracy of the models. Additionally, the study mainly focused on preoperative features readily available in electronic medical records. While these features are easily accessible, not including additional intraoperative features, such as monitoring waveform data, intubation-related data, and ventilator parameters, may have limited the prediction of late PIH. To enhance the robustness and applicability of the predictive model, future studies could consider using multi-center datasets and incorporating more intraoperative features.
Conclusions
This study provides a feasible machine learning model for predicting PIH and an insight into risk factors for PIH. The developed model can serve as a basis for future research and clinical practice, fostering advancements in personalized medicine and enhancing patient safety.
Data availability
The datasets generated and/or analysed during the current study are available in the VitalDB open dataset repository, https://vitaldb.net/dataset/.
Abbreviations
- PIH:
-
Post-induction hypotension
- MAP:
-
Mean arterial pressure
- AUROC:
-
Area under the subject operating characteristic curve
- DCA:
-
Decision curve analysis
- SHAP:
-
Shapley additive explanation
References
Bijker JB, Persoon S, Peelen LM, Moons KGM, Kalkman CJ, Kappelle LJ, et al. Intraoperative hypotension and perioperative ischemic stroke after general surgery: a nested case-control study. Anesthesiology. 2012;116:658–64.
Sessler DI, Bloomstone JA, Aronson S, Berry C, Gan TJ, Kellum JA, et al. Perioperative Quality Initiative consensus statement on intraoperative blood pressure, risk and outcomes for elective surgery. Br J Anaesth. 2019;122:563–74.
Duan W, Zhou C-M, Yang J-J, Zhang Y, Li Z-P, Ma D-Q, et al. A long duration of intraoperative hypotension is associated with postoperative delirium occurrence following thoracic and orthopedic surgery in elderly. J Clin Anesth. 2023;88:111125.
Walsh M, Devereaux PJ, Garg AX, Kurz A, Turan A, Rodseth RN, et al. Relationship between intraoperative mean arterial pressure and clinical outcomes after noncardiac surgery: toward an empirical definition of hypotension. Anesthesiology. 2013;119:507–15.
Tsuchiya M, Yamada T, Asada A. Pleth variability index predicts hypotension during anesthesia induction. Acta Anaesthesiol Scand. 2010;54:596–602.
Choi MH, Chae JS, Lee HJ, Woo JH. Pre-anaesthesia ultrasonography of the subclavian/infraclavicular axillary vein for predicting hypotension after inducing general anaesthesia: A prospective observational study. Eur J Anaesthesiol. 2020;37:474–81.
Hanss R, Renner J, Ilies C, Moikow L, Buell O, Steinfath M, et al. Does heart rate variability predict hypotension and bradycardia after induction of general anaesthesia in high risk cardiovascular patients? Anaesthesia. 2008;63:129–35.
Lee H-C, Park Y, Yoon SB, Yang SM, Park D, Jung C-W. VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients. Sci Data. 2022;9:279.
Mathew G, Agha R, Albrecht J, Goel P, Mukherjee I, Pai P, et al. STROCSS 2021: Strengthening the reporting of cohort, cross-sectional and case-control studies in surgery. Int J Surg. 2021;96:106165.
Salmasi V, Maheshwari K, Yang D, Mascha EJ, Singh A, Sessler DI, et al. Relationship between intraoperative hypotension, defined by either reduction from baseline or absolute thresholds, and acute kidney and myocardial injury after noncardiac surgery: a retrospective cohort analysis. Anesthesiology. 2017;126:47–65.
Wesselink EM, Kappen TH, Torn HM, Slooter AJC, van Klei WA. Intraoperative hypotension and the risk of postoperative adverse outcomes: a systematic review. Br J Anaesth. 2018;121:706–21.
Lin C-S, Chang C-C, Chiu J-S, Lee Y-W, Lin J-A, Mok MS, et al. Application of an artificial neural network to predict postinduction hypotension during general anesthesia. Med Decis Making. 2011;31:308–14.
Kendale S, Kulkarni P, Rosenberg AD, Wang J. Supervised machine-learning predictive analytics for prediction of postinduction hypotension. Anesthesiology. 2018;129:675–88.
Zhou C-M, Xue Q, Liu P, Duan W, Wang Y, Tong J, et al. Construction of a predictive model of post-intubation hypotension in critically ill patients using multiple machine learning classifiers. J Clin Anesth. 2021;72:110279.
Kang AR, Lee J, Jung W, Lee M, Park SY, Woo J, et al. Development of a prediction model for hypotension after induction of anesthesia using machine learning. PLoS One. 2020;15:e0231172.
Tantithamthavorn C, Hassan AE, Matsumoto K. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans Software Eng. 2020;46:1200–19.
Li X-F, Huang Y-Z, Tang J-Y, Li R-C, Wang X-Q. Development of a random forest model for hypotension prediction after anesthesia induction for cardiac surgery. World J Clin Cases. 2021;9:8729–39.
Reich DL, Hossain S, Krol M, Baez B, Patel P, Bernstein A, et al. Predictors of hypotension after induction of general anesthesia. Anesth Analg. 2005;101:622–8.
Südfeld S, Brechnitz S, Wagner JY, Reese PC, Pinnschmidt HO, Reuter DA, et al. Post-induction hypotension and early intraoperative hypotension associated with general anaesthesia. Br J Anaesth. 2017;119:57–64.
Lee J, Woo J, Kang AR, Jeong Y-S, Jung W, Lee M, et al. Comparative analysis on machine learning and deep learning to predict post-induction hypotension. Sensors (Basel). 2020;20:4575.
Zhang J, Critchley LAH. Inferior Vena Cava ultrasonography before general anesthesia can predict hypotension after induction. Anesthesiology. 2016;124:580–9.
Jor O, Maca J, Koutna J, Gemrotova M, Vymazal T, Litschmannova M, et al. Hypotension after induction of general anesthesia: occurrence, risk factors, and therapy. A prospective multicentre observational study. J Anesth. 2018;32:673–80.
Patti R, Saitta M, Cusumano G, Termine G, Di Vita G. Risk factors for postoperative delirium after colorectal surgery for carcinoma. Eur J Oncol Nurs. 2011;15:519–23.
Maheshwari K, Ahuja S, Khanna AK, Mao G, Perez-Protto S, Farag E, et al. Association between perioperative hypotension and delirium in postoperative Critically Ill patients: a retrospective cohort analysis. Anesth Analg. 2020;130:636–43.
Guo Z, Liu J, Li J, Wang X, Guo H, Ma P, et al. Postoperative delirium in severely burned patients undergoing early escharotomy: incidence, risk factors, and outcomes. J Burn Care Res. 2017;38:e370–6.
Tarao K, Daimon M, Son K, Nakanishi K, Nakao T, Suwazono Y, et al. Risk factors including preoperative echocardiographic parameters for post-induction hypotension in general anesthesia. J Cardiol. 2021;78:230–6.
Kosciuczuk U, Gluszynska P, Diemieszczyk I, Lukaszewicz A, Bauer K, Kokoszko M, et al. Effect of rocuronium on the heart rate and arterial blood pressure during combined general anaesthesia. Disaster Emerg Med J. 2021;6:104–11.
Saugel B, Bebert E-J, Briesenick L, Hoppe P, Greiwe G, Yang D, et al. Mechanisms contributing to hypotension after anesthetic induction with sufentanil, propofol, and rocuronium: a prospective observational study. J Clin Monit Comput. 2022;36:341–7.
Miller DR, Wellwood M, Teasdale SJ, Laidley D, Ivanov J, Young P, et al. Effects of anaesthetic induction on myocardial function and metabolism: a comparison of fentanyl, sufentanil and alfentanil. Can J Anaesth. 1988;35:219–33.
Acknowledgements
We would like to thank the VitalDB dataset for providing high-quality intraoperative biosignals and clinical information.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Contributions
CM conducted data collection and manuscript writing. ZD contributed to the study design, guiding the analysis, and interpreting the data. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The acquisition and release of the data was approved by the Institutional Review Board of Seoul National University Hospital (H-1408–101-605). The study was also registered at clinicaltrials.gov (NCT02914444).
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chen, M., Zhang, D. Machine learning-based prediction of post-induction hypotension: identifying risk factors and enhancing anesthesia management. BMC Med Inform Decis Mak 25, 96 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12911-025-02930-y
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12911-025-02930-y