A Decision Tree-Driven IoT systems for improved pre-natal diagnostic accuracy

Yang, Xuewen; Liu, Ling; Wang, Yan

doi:10.1186/s12911-024-02759-x

Research
Open access
Published: 05 December 2024

A Decision Tree-Driven IoT systems for improved pre-natal diagnostic accuracy

Xuewen Yang¹,
Ling Liu¹ &
Yan Wang²

BMC Medical Informatics and Decision Making volume 24, Article number: 375 (2024) Cite this article

845 Accesses
Metrics details

Abstract

Prenatal diagnostics are vital for the woman as well as her unborn baby. The diagnostics help in the early identification of the possibility of complication and the initial measures that help to ameliorate the mother and the fetus health status are taken. Over the year’s various techniques have been employed in diagnosing genetic disorders before birth that lack effectiveness in terms of cost, time, and places to access ultra-modern health facilities. To overcome these problems, this paper puts forward a diagnostic model that integrates Internet of Things innovation with a Machine Learning approach which is the Decision Tree Algorithms. First, it implies the application of IOT devices in the collection of vital information like heart rate, blood pressure, glucose levels, and fetal movement. The data is structured in the form of a dataset and transmitted to a Big Data storage for warehousing and processing. Secondly, the DTA is employed to analyze the data and look for patterns and possibilities of future health complications. The DTA operates in that it divides the dataset into subsets considering specific features and formulates a tree-like model of decisions. At every node, the algorithm chooses the attribute which has the highest information gain, to partition the data into different classes. This process goes on until it reaches a decision node through which, it can decide probable health problems from the input data. To increase the reliability of the developed model this study fine-tunes the model by using a large database of pre-natal health records. The system is capable of collecting data in real-time and flagging data that needs attention in the case of any abnormality to the health professional. The above methodology was tested on a 1000-record database of pre-natal health records where the proposal achieved 95% possibility of potential health problems as against 85% by classical statistical analysis. Furthermore, the system scaled down the number of false positive cases by 20 percent and false negatives by 15 percent thus the efficacy of the system.

Peer Review reports

Introduction

Some of the most important tests that need to be executed are pre-natal diagnostics that can help to avoid many health problems both for mother and child during pregnancy [1]. These diagnoses entail several activities that are normally in an attempt to identify possible health complications as early as possible to treat them before they escalate. However, conventional techniques of pre-natal diagnostics have several problems that can affect significantly their efficiency and applicability in different contexts [2]. Earlier diagnostic procedures such as sonography, biochemical, and genetic analyses are proven but have disadvantages/risks. Such methods usually involve referrals to specialized health facilities that have costly imaging and diagnostic tools [3]. Patients in rural or understaffed hospitals may not easily get to these facilities hence delays in diagnosis and treatment may occur. Besides, the proficiency demanded in results interpretation can also be limited particularly in areas where qualified personnel in healthcare is a rarity [4]. However, the limitations of the cost that are connected with the conventional pre-natal diagnosis can be also viewed as barriers to the universal application. Some of the costs of the ultrasound equipment, relevant laboratory tests, or consultation may be quite high, which is very unrealistic for poor families. Thus, expectant mothers may not get the screenings on time, or at all, with potentially untreated diseases leading to poor maternal and fetal health [5].

Concerning these issues, new ideas using IoT and Decision Tree Algorithms could be seen as the effective solution. Electronic gadgets like wearable sensors and remote controls help provide constant and accurate records of maternal health status, fetal growth, and any conditions of the surrounding environment [6]. This also produces more detailed data and can help in determining problems or changes in the body that may pose health risks earlier than when checked through routine examination. DTA stands for Decision Trees and Axes which is an ML algorithm candidate that is capable of handling large, extensive data and making decisions over learned patterns. It was acknowledged to improve the accuracy of diagnosis [7]. The applied data mining technique is the CART that, following several iterations, creates branches of the data set based on criteria such as maternal age, vital signs, and fetal health [8]. The algorithm builds a tree-like structure: each node of the tree is a test on an attribute, a branch connecting two nodes is the result of the test, and the end-nodes, or leaves, are the class labels, normal or abnormal health condition [9]. The depth of the decision tree defines the complexity of the tree construction, and it interacts with the dataset complexity and the model’s simplicity/accuracy ratio. In the proposed study, the maximum depth of the decision tree is managed through the cross-validation methods which eliminate the models’ overlearning and guarantee their high predictive abilities on the new data [10]. This generalization capability of the decision tree enables the identification of the various patterns of pre-natal health while at the same time eliminating complexity within the data. Through the analysis of data acquired in real-time by the IoT devices decision trees can determine patterns that foresee different pre-natal health conditions [11]. This predictive capability makes it possible for the providers to intervene hence enhancing care for the mother and the fetus and possibly reducing the adverse outcomes. This paper outlines the need and the framework of the Decision Tree-Driven IoT system with emphasis on the pre-natal diagnosis. The scheme will incorporate IoT gadgets for assessing data constantly and transferring them into a central control unit. DTA will be trained so that pre-natal health data will be classified and the health condition of the mother will be predicted based on the collected data. The objectives of this study include implementing and evaluating the effectiveness of this integrated system in enhancing pre-natal diagnostic accuracy compared to traditional methods. The major contributions of the study are as follows:

Implement IoT devices for real-time data collection in pre-natal care settings which ensure accessibility and continuity of monitoring.
Develop and train a decision tree algorithm to analyze and predict health conditions based on collected data aiming for high accuracy and reliability.
Evaluate the performance of the DTA Driven IoT system in real-world pre-natal care scenarios which assess the impact on diagnostic outcomes.
Compare the effectiveness and efficiency of the integrated system with traditional pre-natal diagnostic methods, highlighting advantages and potential areas for improvement.

The remainder of this paper explains the related work in section 2. It describes past research on the subject and sheds light on issues and achievements. Section 3 details the methodology used to develop the IoT system, train the decision tree algorithm, and validate its performance. Section 4 will present the results of the study. It discusses how the integrated system performs in practice and its potential implications for enhancing pre-natal care worldwide. Section 5 concludes the study and lists challenges, limitations, and future directions for research in pre-natal diagnostics using IoT.

Literature review

Advancements in IoT and ML algorithms have envisioned a new era of personalized and data-driven healthcare applications that offer innovative opportunities to enhance the accuracy, efficiency, and accessibility of prenatal care. The introduction of IoT technologies further enhances the benefits of the algorithms as they collect real-time data and store it for necessary insights and decisions. Among these algorithms, DTA models have emerged as powerful tools for interpreting complex medical data and supporting clinical decision-making in prenatal diagnostics. Research work at [12] explored IoT applications in healthcare and emphasized its transformative impact on remote patient monitoring and personalized medicine. It uses IoT devices such as wearable sensors and medical implants to collect real-time data on patients’ vital signs, activity levels, and medication adherence. This data is transmitted securely to healthcare providers who analyze it for proactive interventions and continuous monitoring. This improves patient outcomes and reduces healthcare costs.

In addition, the in [13] explained the efficiency of the decision tree algorithms in the medical image analysis. DTA divides the data again and again to use various image features derived from medical scans to include MRI or X-ray images. In this way, decision trees facilitate the above-mentioned features in the diagnosis of conditions like tumors, fractures, or abnormalities by radiologists and clinicians. The article under discussion [14] concerned the integration of IoT devices in prenatal care and was devoted to the enhancement of maternal-fetal monitoring and the early identification of complications. Portable devices and wireless technology track factors such as the mother’s pulse, baby’s movements, and contractions. Through timely communication real data is transmitted, pregnancies are monitored remotely, and when complications occur; they are detected early, and interventions are made hence improving maternal health and reducing the impacts of unfavorable fetal conditions. In the same regard, a study [15] provided a bibliography of machine learning techniques and decision tree algorithm integration in healthcare. DTA are marked by their structure in which every circle is a decision that depends on the input variables (e.g., demographics of the patient, complaints). This approach stands out in medical data analysis because of its readability and versatility in dealing with many data formats for diagnosing diseases, classifying patients’ risk levels, and determining the best therapy. IoT and cloud computing were introduced in the context of the healthcare system by the authors [16] and the integration of both was found to be complementary. Connected devices gather large volumes of Patient-Generated Health data, which are analytically processed and disposed of in the cloud. Cloud computing, therefore, offers organizations in the healthcare industry a cost-effective way to store and process gigantic data sets. Cooperating with different healthcare providers and medical organizations, this integration makes real-time patient monitoring, personalized delivery of healthcare services, and data sharing between different healthcare providers well-developed, improving the quality of healthcare services and patients’ satisfaction.

It is noteworthy that the diagnosis of birth-related disorders includes prenatal diagnostics. Article [17] used the Decision Tree Algorithm to classify preterm births with the help of maternal health status and identified IoT data sets. Using such attributes as – the mother’s age, medical history, and contemporary mobile-physiological indices including blood pressure, and fetal heartrate, decision trees find out high-risk pregnancies prone to pre-term delivery. It enables the healthcare provider to identify the abovementioned conditions early, and the implementation of early intervention can prevent or minimize the adverse effects on the baby. Similarly, a study [18] investigated the level of dependability of IoT-based fetal monitoring systems in identifying fetal stress. Such systems constantly identify abruption of the fetal heart rate variability, the movements of the fetus, and oxygen levels in the baby’s blood. In real-time data analysis, healthcare providers get notified whenever the indicators of Fetal Distress are detected, and this means that risks are managed to the best extent, and perinatal health is enhanced. A research study [19] explored the use of ensemble learning methods such as random forests in prenatal screening. Ensemble methods combine multiple decision trees to improve prediction accuracy by reducing bias and variance. Applied to prenatal diagnostics random forests integrate genetic data, maternal health factors, and fetal ultrasound findings to identify high-risk pregnancies and guide clinical management decisions effectively. Similarly, the study [20] implemented deep learning algorithms, specifically convolutional neural networks (CNNs) in fetal ultrasound image analysis. CNNs effectively capture the hierarchy features of the ultrasound images and allow for efficient diagnosis of fetal anomalies and abnormalities. By utilizing DL features, clinicians can gain rich information to analyze imaging data in prenatal diagnosis and treatment. Bayesian networks have been employed by authors in [21] to predict risk factors concerning maternal and fetal health outcomes; Bayesian networks model probabilities of interactions between variables, including maternal age, lifestyles, and health risks. Based on the identified complex relations, Bayesian networks offer extensive information regarding prenatal health conditions, contributing to risk estimation and individualized management. Similarly, a study [22] looked into the use of machine learning for prenatal diagnosis using SVMs and neural networks. These techniques are used to predict risks that a fetus may be having, such as congenital abnormalities or genetic disorders, from genetic analysis, maternal records, and fetal ultrasound. The data from multiple sources enriches machine learning models and increases the probability of an accurate diagnosis in the prenatal field. Work done at [23] compared the expansiveness and accuracy of decision tree models applied to healthcare. These algorithms scan patient data to seek patterns, forecast their future health status, and assist clinicians in making decisions.

The following research sections presented in the current paper aim to compare the effectiveness of IoT and ML algorithms in diagnosing various prenatal complications. These describe several difficulties that appear in prenatal diagnosis and maternal-fetal surveillance. Some of these are the ability to predict pregnancy complications such as pre-term births and fetal distress depending on the various aspects such as maternal health indicators and fetal monitoring data. DTA outperforms the others in these tasks because it enables the detection and analysis of intricate data patterns and offers explainable results. Decision trees, due to their ability to partition data into subsets with required features, help accurately define health conditions and predict outcomes.

Methodology

The proposed research study is implemented in a stepwise procedure. These steps involve system architecture which comprises of selection of participants of the study, deployment of IoT devices, and establishment of a communication network among the devices. The second step involves the collection of real-time data using the sensors and storing it in a big data database for analysis. The third step is related to making data uniform and consistent using different filtration techniques. After data is prepared a DTA is constructed and trained using the subset of collected data. Lastly, the model is hyper-tuned and tested using the data. Figure 1 demonstrates the process flow of the proposed study.

System model

The proposed system architecture consists of three main components: Thus, the developed system will consist of IoT devices, a central processing unit, and a decision tree algorithm. Combined, it delivers a complete and up-to-date pre-natal diagnostic service within our solutions. IoT devices are always on and acquire data without ever allowing crucial information to be left unnoticed. The central processing unit plays a vital role since it is responsible for passing, storing, and preprocessing data. The decision tree algorithm narrows down the existence of patterns in the data and the probable health concerns to look out for. Thus, this combined approach leads to the system’s ability to deliver correct and timely diagnostic information.

IoT deployment

The proposed system deploys a comprehensive range of IoT devices which include wearable sensors, smart medical equipment, and home-based monitoring systems. These devices are specifically designed for continuous and non-intrusive monitoring of expectant mothers. The proposed deployment includes multiple sensors to monitor different health parameters. It consists of five wearable sensors in the form of smartwatches like the Apple Watch, fitness bands such as Fitbit, and specialized medical wearables like the BioPatch [24]. In addition, three smart medical devices, such as digital blood pressure monitors, continuous glucose monitors like the Dexcom G6, and fetal heart rate monitors are used. Furthermore, a home-based monitoring system includes four devices smart scales, temperature sensors, and ambient monitoring devices. All these devices provide contextual data valuable for comprehensive monitoring. Let say D be the set of deployed devices.

$$\text{D}=\{{\text{d}}_{1} ,{\text{d}}_{2} ,{\text{d}}_{3} ,{\text{d}}_{4} ,{\text{d}}_{5} ,....... ,{\text{d}}_{9} \}$$

(1)

In the above Equation, D_i represents a specific device used for data recording. Let T be the set of types of measurement parameters such that:

$$\text{T}=\{{\text{T}}_{1} ,{\text{T}}_{2} ,{\text{T}}_{3} ,{\text{T}}_{4} ,{\text{T}}_{5} ,....... ,{\text{T}}_{9} \}$$

(2)

In the above Equation, T_i represents a specific parameter used for recording. To demonstrate the relationship among the devices and its parameters a binary matrix M is used. As the number of devices is 9 and the total number of parameters is 9 then there is a 9 × 9 matrix such that M_ij = 1 when device measures a parameter and M_ij = 0 otherwise. These devices are deployed in various locations to ensure comprehensive monitoring. At home, wearable sensors and home-based monitoring systems are deployed to allow continuous real-time data collection without requiring frequent visits to healthcare facilities. In clinical settings, smart medical equipment provides more detailed and precise measurements during scheduled check-ups. Let L be the set of deployment locations:

$$\text{L}=\{On\;Body,\;\text{Home},\;\text{Clinical}\}$$

(3)

We can use a vector V to represent the deployment locations for each device. Wearable sensors can be worn throughout the day regardless of the expectant mother’s location which ensures that no critical data is missed. This multi-location deployment ensures comprehensive coverage and monitoring. The deployment of these devices aims to achieve several key objectives. Each type of IoT device has specific functionalities. Wearable sensors like the Apple Watch and Fitbit continuously monitor vital signs such as heart rate, blood pressure, and physical activity using embedded sensors [25]. They transmit the data wirelessly to the central processing unit. Smart medical equipment such as digital blood pressure monitors and the Dexcom G6 glucose monitor provide precise measurements that are often used during clinical visits but can also be used at home for regular monitoring. Home-based monitoring systems, such as smart scales and temperature sensors, monitor environmental factors like temperature and weight changes. These systems use sensors placed around the home to collect data and provide a more comprehensive view of the expectant mother’s health. Table 1 provides a detailed presentation of different IoT devices, their locations, and the purpose of deployment.

Table 1 Different types of sensors and its purpose and location of deployment

Full size table

The deployed IoT devices collect a wide range of data continuously and non-invasively. Biosensors and intelligent health-related equipment record pulse, blood pressure, breathing rate, and temperature measurements. They encompass devices that check blood glucose levels for an extended period and play a crucial role in gestational diabetes. Some of these monitors focus on the fetus’s movements and heartbeat, which indicate the fetus’s status. Home monitoring systems bring data regarding temperature, humidity, and other environmental factors of the dweller’s health. This diverse data collection also ensures the system has all-round information on the expectant mother’s health status. Table 2 illustrates various types of gadgets, their measurement indices, and implementation sites in the improved pre-natal diagnostics IoT system.

Table 2 Representation of Devices, locations and parameters

Full size table

Figure 2 Displaying the heatmap to represent the devices and their location to measure the parameters

Heart rate is measured in beats per minute (bpm) using photoplethysmography (PPG) sensors in wearable devices. Blood pressure is recorded in millimeters of mercury (mmHg) using the oscillometer method in smart medical equipment. Respiratory rate is measured in breaths per minute using bioimpedance or respiratory inductance sensors. Body temperature is measured in degrees Celsius or Fahrenheit using thermistors or infrared sensors. Glucose levels are measured in milligrams per deciliter (mg/dL) using electrochemical sensors. Fetal heart rate and movements are monitored using Doppler ultrasound and accelerometers respectively. Environmental data is collected using temperature and humidity sensors. Table 3 demonstrates the data and measurement units.

Table 3 Measurement method and scale of different types of health parameters

Full size table

Data is collected through various methods, depending on the device. Wearable sensors continuously record and transmit data wirelessly to the central processing unit using Bluetooth or Wi-Fi. Smart medical equipment records data during use and either transmits it wirelessly or uploads it manually to the system. Home-based monitoring systems collect data using sensors placed around the home and transmit it wirelessly to the central processing unit. This comprehensive data collection ensures that the system has the most up-to-date information for analysis and early detection of potential health issues. Figure 3 demonstrates the deployment of IoT devices across the network.

Storage and process model

IoT deployment for pre-natal diagnostics makes use of different types of CPUs as per the specific operational needs. In the proposed study Microcontrollers, such as those from the ARM Cortex-M series are favored for their low power consumption and capability to manage basic data aggregation and sensor interfacing tasks. On the other hand, Application Processors like the ARM Cortex-A series are chosen for more complex computations such as real-time data analysis and communication management. System-on-chip (SoC) designs integrate these functionalities into compact solutions, which enhances device efficiency and performance. Within the IoT ecosystem, pre-natal diagnostics CPUs are deployed at critical points: Firstly, On-Device CPUs are embedded within wearable devices such as smartwatches (e.g., Apple Watch) and fitness trackers (e.g., Fitbit). They process sensor data locally. Mathematically, sensor data fusion, and basic processing are represented using the equation below.

$$\text{y}=\text{ f}(\text{x})$$

(4)

In the above Equation, x represents raw sensor data and y denotes processed outputs which ensure real-time health monitoring. Secondly are Gateway Devices. These are positioned in gateways that aggregate data from multiple sensors before transmission to cloud servers. Mathamthicaly, it is denoted as follows.

$$\text{R}= \frac{\text{D}}{T}$$

(5)

In the above Equation, R is the transmission rate, D is data size and T is time and it ensure efficient data flow and timely updates. Thirdly are Cloud Servers which are utilized for intensive data processing and storage. Table 4 depicts the CPUs used for the proposed study.

Table 4 Different types of CPUs utilized in Pre-natal diagnostics forecasting

Full size table

Operational model

Let us consider a scenario focused on advancing prenatal care, a pregnant woman in her third trimester participates in a study utilizing IoT devices for continuous health monitoring. Sarah wears a smartwatch equipped with sensors for monitoring her heart rate, activity levels, and skin temperature. Additionally, she uses a digital blood pressure monitor and a glucose monitor to manage gestational diabetes. The collected data is communicated via Bluetooth to her smartphone for real-time updates and syncing. The digital blood pressure monitor checks selectee blood pressure regularly and wirelessly connects to a home-based gateway device using Bluetooth or Wi-Fi. Similarly, the glucose monitor tracks her glucose levels throughout the day and communicates with the gateway device via Bluetooth or Wi-Fi. Locally, the gateway device aggregates data from all her health devices like a smartwatch, blood pressure monitor, and glucose monitor. It securely transmits this aggregated data to a cloud-based server via Wi-Fi. In the cloud, servers store and process health data using DTA that analyzes it in real time. These algorithms compare her data against established thresholds and historical patterns to detect trends and anomalies promptly. This analysis helps alert healthcare providers to potential health issues early, facilitating timely interventions and personalized care adjustments. Operational model of the proposed study is shown in Fig. 4.

Data model

The IoT network continuously collects data and forwards this data to sink in a wireless manner. Certainly! Here’s the information presented in a cohesive paragraph: The study on IoT-enabled pre-natal diagnostics involves collecting and analyzing comprehensive health data from pregnant women using IoT devices over a span of two years. The dataset comprises over 100,000 individual records encompassing key health parameters such as heart rate, activity levels, skin temperature, blood pressure, and glucose levels. These data are collected via smartwatches, digital blood pressure monitors, and glucose monitors. All these parameters ensure continuous monitoring throughout the participants’ pregnancies. From this dataset, a subset of 5,000 records was randomly selected for detailed analysis. The dataset represents various stages of pregnancy and health conditions. The study integrates Bluetooth and Wi-Fi for data transmission to a secure cloud-based server where data undergoes preprocessing to ensure accuracy and reliability. Table 5 depicts different parameters of the dataset.

Table 5 Different study parameters for pre-natal diagnostics

Full size table

The study captures different types of data among which the following parameters are selected for defining the diagnostics. These attributes include essential health parameters such as heart rate, activity levels, skin temperature, blood pressure, and glucose levels. Each attribute provides unique insights into the participant’s health status and is crucial for monitoring and analyzing trends over time. Table 6 shows the attributes use to determine the pre-natal diagnostics information.

Table 6 Attributes use to determine the pre-natal diagnostics information

Full size table

The overall dataset trends in maternal health parameters are essential for understanding and managing pre-natal diagnostics effectively as depicted in Fig. 5. Firstly, the line plot for heart rate variation over time illustrates the fluctuations in heart rate (BPM) throughout pregnancy which highlights critical trends and potential abnormalities. Secondly, the histogram of skin temperature distribution reveals the frequency and range of temperature measurements. It helps to identify typical and outlier readings. Thirdly, is the scatter plot that shows the correlation between activity levels and heart rate which provides insights into how physical activity impacts cardiovascular health. Fourthly, the bar chart categorizes participants by health condition. It indicates the prevalence of conditions like gestational diabetes and hypertension among the study population. Finally, the box plot displays the distribution and outliers of blood pressure measurements, which offer a clear view of vascular health variability and potential hypertension risks.

Similarly, the pair plot in Fig. 6 provides a comprehensive visualization of the relationships between multiple maternal health parameters. It reveals how one parameter changes in relation to another. The scatter plot between heart rate and activity levels may indicate a positive correlation, where increased activity levels correspond to higher heart rates. The diagonal plots display the distribution of each variable and highlight typical values and outliers.

Preprocessing model

The dataset contains errors that can arise from various sources, such as signal attenuation, transmission issues, intermixing of signals, and missing values. These errors can affect the quality and reliability of the collected data. The analysis and insights generated by a model are dependent on the quality of the data. If the data quality is low the number of true positives in the model is less number, which results in the overall degraded performance of the model. To improve the quality of the model corrections should be made which will ensure accurate pre-natal diagnostics. Figure 7 presents the current status of data and errors in it.

These errors are removed using the following specified techniques.

Missing values

Missing values occur when sensors fail to record data points. This can happen due to sensor malfunctions or transmission failures. This study uses techniques like mean imputation. In mean imputation, the missing values xi are replaced with the missing with the mean μ of the available values. Mathematically, it is represented using the equation below.

$$\text{xi}=\upmu$$

(6)

$$\mu =\frac{1}{n}{\sum }_{j=1}^{n}{x}_{j}$$

(7)

In the above Equation, n is the number of non-missing values.

Signal attenuation

Signal attenuation results in degraded sensor readings. These readings appear as outliers significantly different from normal values. This work uses z-score normalization to detect and handle these outliers. The z-score z is calculated as follows

$$z=\sigma x-\mu$$

(8)

In the above Equation, x is the value, μ is the mean, and σ is the standard deviation and is calculated using below equation.

$$\upsigma =\sqrt{\frac{\sum ({x}_{i}-\overline{x }{)}^{2}}{n-1}}$$

(9)

If ∣z∣ > 3 the datapoint is considered as an outlier and can replace it with the mean or median value.

Transmission errors

Transmission errors occur when data points get corrupted during transmission. This results in abnormally high or low values. Median filtering is used.to handle these errors. The median filter works by replacing each data point with the median of neighboring data points. For a window size k, the new value x_. This method effectively smooths out spikes caused by transmission errors.

$$\text{xi}=\text{median}\left({x}_{i-k} ,\dots ,{x}_{i} ,\dots ,{\text{x}}_{i+k} \right)$$

(10)

Intermixing

Intermixing errors occur when signals from different sources mix, resulting in incorrect data points. To address this, we use techniques like Independent Component Analysis (ICA). ICA separates mixed signals by maximizing their statistical independence. Given observed signals x, we model them as linear combinations of independent sources s.

$$x=As$$

(11)

In the above Equation, A is a mixing matrix. ICA estimates A and s such that the components of s are as independent as possible. Figure 8 shows the status of data for original data and cleaned data.

All these methods are used to remove outliers, and missing values in the dataset and make the data clean. The dataset in the proposed study is divided into training, testing, and validation subsets. First, we separated the data into two main parts: training and testing. We used 80% of the data for training the model. The remaining 20% of the data went to the testing subset. Next, we took a portion of the training data for validation. This validation subset helped to fine-tune the model. We ensured that the training data had enough variety for the model to learn effectively. The testing subset provided a final check on the model’s performance. This method ensured that the model learned well and generalized accurately to new data.

Analysis model

The Decision Tree algorithm is one of the most effective and commonly used machine learning methods because it refines the prognosis concerning the health condition of both mothers and babies. This algorithm develops a model in the form of a decision tree. The nodes contain features or attributes, the branches contain decision rules, and the leaves include outcomes or class labels.

First is the initiation of the algorithm at the root node. It assesses the performance of every conceivable division of data sets with every feature captured from the IoT devices. Some of the features are maternal heart rate (X₁), Maternal blood pressure (X₂), Maternal oxygen saturation (X₃), Maternal glucose level (X₄), and Fetal movement (X₅). The objective is to determine the optimality of the splitting, which will give the correct classes in the data set, in this case, normal and high-risk pregnancies. The algorithm adopts the split qualities measured using Gini impurity or information gain, information gain is defined as:

$$IG\left(T,a\right)=Entropy\left(T\right)-{\sum }_{v\in {\text{Values}}\left(a\right)}\frac{\left|{T}_{v}\right|}{\left|T\right|}\cdot Entropy\left({T}_{v}\right)$$

(12)

In the above equation, T represents the total datasets, and a is the attribute. Tv is the subset of T where attribute a has value v. |T| is the number of instances, and Entropy(T) is the entropy of the dataset T. The Gini Impurity (G) is calculated as Tv is the subset of T where attribute a has value v. |T| is the number of instances, and Entropy(T) is the entropy of the dataset T. The Gini Impurity (G) is calculated as:

$$G\left(T\right)=1-{\sum }_{i=1}^{C}{p}_{i}^{2}$$

(13)

In the above Equation, p_i is the probability of an instance falling to class I, and C is the total number of classes. Among all the possible splits, the one that delivers the most significant information gain as to the lowest Gini impurity constitutes the decision rule of the root node. Here is the algorithm for Gini impurity, which is explained above.

Next, the algorithm recursively applies the same process to each subset of the data. It continues to split the data at each node based on the remaining features. This process creates new branches and nodes. If the root node split is based on maternal heart rate (X₁), the next node might split based on blood pressure (X₂), and so on. The recursion continues until the algorithm meets a stopping criterion, such as a maximum tree depth or a minimum number of samples per leaf node. When further splits do not improve the model’s performance the process stops resulting in a fully grown tree. At each node, the decision rule can be represented as:

$$\text{If }{X}_{j}\le\uptheta \text{ then go to left child node, else go to right child node}$$

(14)

In the above Equation, X_j is the feature while $\theta$ is the threshold value for the split. Algorithm below describe the working of decision tree algorithm.

One advantage of Decision Trees is their ability to handle both numerical and categorical data. They can also handle missing values and do not require feature scaling. Decision Trees naturally perform feature selection during the training process. They select the most important features for splitting the data, which reduces the need for manual feature selection. In our study, the tree will automatically identify the most critical health parameters for diagnosing prenatal conditions Figure 9. The data was collected from various IoT devices that monitored maternal heart rate, blood pressure, oxygen saturation, glucose level, and fetal movement. The decision tree algorithm used all these features to make necessary predictions. The tree shows that if the maternal heart rate is below or equal to 90 bpm the algorithm checks the blood pressure; if it is also below or equal to 120 mmHg, it further examines the oxygen saturation level to predict whether the outcome is 'Normal' or 'At-Risk.' This hierarchical decision-making process continues through the tree, using various thresholds to predict the health outcome efficiently. The trained model demonstrated its capability to predict 'Normal' or 'At-Risk' conditions based on the collected health metrics, highlighting its potential for improving prenatal diagnostic accuracy.

However, DTA have some drawbacks. They can easily overfit the training data especially when the tree is deep. Overfitting occurs when the model captures noise in the training data. This led to poor generalization to new data. To combat overfitting, this study uses pruning. Pruning removes some branches of the tree after it is fully grown. This simplifies the model and improves its generalization ability. The pruning process can be represented mathematically by reducing the complexity parameter:

$${C}_{\alpha }\left(T\right)=\sum\nolimits_{m=1}^{\left|T\right|}{P}_{m}\left(1-{P}_{m}\right)+\alpha . {T}_{m}$$

(15)

In the above Equation, T is the set of leaf nodes.pm is the proportion of class instances in leaf node m and α is a complexity parameter.

Results and evaluation

To evaluate the performance and the effectiveness of the Decision Tree-Driven IoT system for prenatal diagnostics different performance metrics are used to assess the model’s accuracy. These metrics are based on the generation of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN). True Positive (TP) refers to the number of instances that are correctly predicted as positive. A high TP indicates that the model correctly identifies positive cases. True Negative (TN) represents the number of instances that are correctly predicted as negative. A High TN indicates that the model correctly identifies negative cases. False positive is defined as the number of instances that are incorrectly predicted as positive. High FP means the model is generating false alarms, predicting positive cases where there are none. Likewise, False Negative (FN) defines the number of instances that are incorrectly predicted as negative. High FN means the model is missing positive cases, failing to identify actual positive instances. To following Table 7 depicts the score generated after testing the model.

Table 7 Measurement of Actual and predictive values for the proposed dataset

Full size table

We derived these metrics from the confusion matrix. The confusion matrix provided a detailed breakdown of the model’s performance. It showed the counts of TP, TN, FP, and FN. A high number of TP and TN indicated that the model accurately predicted normal and at-risk cases. Conversely, a high number of FP and FN highlighted the model’s errors in prediction. Figure 10 demonstrates the confusion matrix for the said study.

The ROC curve presented in Figure 11 demonstrates how well each classifier can distinguish between the positive and negative classes at various threshold settings. It helps in evaluating the True Positive Rate against the False Positive Rate and aids in evaluating the performance of the classifiers. The AUC (Area Under the Curve) value summarizes the performance with higher values indicating better classification performance.

We calculated several key metrics using the values from the confusion matrix. Accuracy measured the overall correctness of the model. Precision evaluated the proportion of true positive predictions among all positive predictions. Recall assessed the model’s ability to identify true positive cases among all actual positive cases. The F1-score provided a balance between precision and recall. Mathematically they can be calculated using the following equations respectively.

$$Accuracy =\frac{TP+TN}{TP+FP+TN+FN}$$

(16)

$$Precision= \frac{TP}{(TP + FP) }$$

(17)

$$Recall= \frac{Tp }{\text{TP}+\text{FN}}$$

(18)

$$F1\;Score=2X\frac{Prec\;X\;Recall}{\text{Prec }+\;\text{ Recall}}$$

(19)

In the above Equation, TP is true positive, TN is true negative, FP is false positive and FN represents false negative. The model achieved an accuracy of 87.5%, indicating a high level of correctness. The precision score of 87.5% showed that the model made a high proportion of correct positive predictions. The recall score of 77.8% indicated that the model could identify a significant number of true positive cases. The F1-score of 82.4% balanced precision and recall, confirming the model’s effectiveness. Figure 12 demonstrates the comparison of major models like Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Linear regression.

Overall, the results demonstrated that the Decision Tree-Driven IoT system performed well in prenatal diagnostics. The high accuracy and balanced precision-recall scores highlighted the model’s potential in real-world applications. DTA impacts the overall process of pre-natal diagnostics by enhancing the accuracy and interpretability of risk assessments. Figure 13 shows a clear comparison of risk levels identified by traditional methods versus DTA. It illustrates the improved precision of DTA in predicting high-risk cases. The classification reports confirm the findings and show that in comparison to traditional methods the proposed model shows a lower degree of risk levels leading to better-informed decisions in pre-natal care.

Likewise, Fig. 14 illustrates the performance comparison between Decision Tree-Based methods (DTA) and Traditional methods across various pre-natal diagnostic types. It shows that DTA consistently outperforms Traditional methods in most diagnostic categories. The proposed method achieves higher effectiveness scores in assessing maternal health, and fetal development, and predicting outcomes. Traditional methods fall short especially in integrating multiple factors and providing comprehensive risk assessments. The visual differences in effectiveness scores highlight the superior capability of DTA in delivering accurate and actionable insights for pre-natal diagnostics.

Conclusion and research plan

The integration of IoT systems with decision tree algorithms offers a new way to improve pre-natal diagnostics. This system provides real-time monitoring and remote diagnostics. It ensures timely alerts, addressing the limitations of traditional methods. The system is cost-effective and accessible benefiting expectant mothers, especially in remote areas. The system is cost-effective and accessible benefiting expectant mothers, especially in remote areas. The system was evaluated using different performance metrics such as accuracy, sensitivity, and specificity in comparison to conventional models. The effects of the study on different aspects of the patient are also evaluated. The results analysis shows that the proposed model can enhance the detection of pre-natal diagnosis and generate timely reports. Future research will focus on improving the decision tree algorithm by exploring advanced machine learning techniques to increase accuracy. Ensuring data privacy and security remains a priority. Developing infrastructure in remote areas will support the system’s deployment. Lastly, conducting large-scale clinical trials will validate the system’s effectiveness.

Data availability

To access the data that underpins the research findings, please get in touch with the designated author.

References

Xu W, Sampson M. Prenatal and childbirth risk factors of postpartum pain and depression: a machine learning approach. Matern Child Health J. 2023;27(2):286–96.
Article PubMed Google Scholar
Young D, Khan N, Hobson SR, Sussman D. Diagnosis of placenta accreta spectrum using ultrasound texture feature fusion and machine learning. Comput Biol Med. 2024;178:108757.
Article PubMed Google Scholar
Zhu C. An adaptive agent decision model based on deep reinforcement learning and autonomous learning. J Logist Inform Serv Sci. 2023;10(3):107–18. https://doiorg.publicaciones.saludcastillayleon.es/10.33168/JLISS.2023.0309.
Article Google Scholar
Duyzend MH, Cacheiro P, Jacobsen JO, Giordano J, Brand H, Wapner RJ, ...Smedley D. Improving prenatal diagnosis through standards and aggregation. Prenat Diagn. 2024;44(4):454-464.
She J, Huang H, Ye Z, Huang W, Sun Y, Liu C, ...Ning G. Automatic biometry of fetal brain MRIs using deep and machine learning techniques. Sci Rep. 2023;13(1):17860.
Yang H, Zheng J, Wang W, Lin J, Wang J, Liu L,... Liao Y. Zr-MOF Carrier-enhanced dual-mode biosensing platforms for rapid and sensitive diagnosis of Mpox. Adv Sci. 2024:2405848. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/advs.202405848.
Capilla-Lasheras P, Wilson AJ, Young AJ. Mothers in a cooperatively breeding bird increase investment per offspring at the pre-natal stage when they will have more help with post-natal care. PLoS Biol. 2023;21(11):e3002356.
Article CAS PubMed PubMed Central Google Scholar
Fan Z, Liu Y, Ye Y, Liao Y. Functional probes for the diagnosis and treatment of infectious diseases. Aggregate. 2024:e620. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/agt2.620.
Huang Y, Alvernaz S, Kim SJ, Maki P, Dai Y, Bernabé BP. Predicting prenatal depression and assessing model bias using machine learning models. Biol Psychiatry Glob Open Sci. 2024;4:100376.
Article PubMed PubMed Central Google Scholar
Chen X, Xu D, Gu X, Li Z, Zhang Y, Wu P, Li Y. Machine learning in prenatal MRI predicts postnatal ventricular abnormalities in fetuses with isolated ventriculomegaly. Eur Radiol. 2024;34(11):7115-24. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s00330-024-10785-6.
Arain Z, Iliodromiti S, Slabaugh G, David AL, Chowdhury TT. Machine learning and disease prediction in obstetrics. Curr Res Physiol. 2023;6:100099.
Article CAS PubMed PubMed Central Google Scholar
Liscovitch‐Brauer N, Mesika R, Rabinowitz T, Volkov H, Grad M, Matar RT, Shomron N. Machine learning‐enhanced noninvasive prenatal testing of monogenic disorders. Prenat Diagn. 2024;44(9). https://doiorg.publicaciones.saludcastillayleon.es/10.1002/pd.6570.
Shi M, Hu W, Li M, Zhang J, Song X,... Sun W. Ensemble regression based on polynomial regression-based decision tree and its application in the in-situ data of tunnel boring machine. Mech Syst Signal Process. 2023;188:110022.https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.ymssp.2022.110022.
Zhang C, Ge H, Zhang S, Liu D, Jiang Z, Lan C,... Hu R. Hematoma evacuation via image-guided para-corticospinal tract approach in patients with spontaneous intracerebral hemorrhage. neurology and therapy. 2021;10(2):1001-1013.https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s40120-021-00279-8.
Shukla V, Sarkar A, Mohanty A. Machine Learning-Based Fetal Assessment: A Non-Intrusive Approach. In 2024 3rd International Conference on Artificial Intelligence For Internet of Things (AIIoT). IEEE. 2024. p. 1-6.
Sun T, Lv J, Zhao X, Li W, Zhang Z,...Nie L In vivo liver function reserve assessments in alcoholic liver disease by scalable photoacoustic imaging. Photoacoustics. 2023;34:100569. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.pacs.2023.100569.
Carlton K, Zhang J, Cabacungan E, Herrera S, Koop J, Yan K, Cohen S. Machine learning risk stratification for high-risk infant follow-up of term and late preterm infants. Pediatr Res. 2024;11(2):1–9.
Țarălungă DD, Manea I, Preoteasa RM, Florea BC, Neagu GM. Artificial Intelligence Advancements in Fetal Monitoring: Enhancing Prenatal Care. In: European Medical and Biological Engineering Conference. Springer Nature Switzerland: Cham; 2024. p. 106–14.
Chapter Google Scholar
Lin X, Lu L, Pan J. Hospital market competition and health technology diffusion: an empirical study of laparoscopic appendectomy in China. Soc Sci Med. 2021;286:114316. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.socscimed.2021.114316.
Article PubMed Google Scholar
Xia J, Cai Z, Heidari AA, Ye Y, Chen H,... Pan Z. Enhanced moth-flame optimizer with quasi-reflection and refraction learning with application to image segmentation and medical diagnosis. Curr Bioinform. 2023;18(2):109-142.https://doiorg.publicaciones.saludcastillayleon.es/10.2174/1574893617666220920102401.
Liscovitch-Brauer N, Mesika R, Rabinowitz T, Volkov H, Grad M, Matar RT, Shomron N. P766: A combined Bayesian inference and machine-learning approach for prenatal screening by cell free DNA of monogenic disorders. Genet Med Open. 2024;7(2). https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.gimo.2024.101674.
Gomathi R, Menaka K. A Support Vector Machine Classifier Approach for Predicting Preeclampsia and Gestational Hypertension. In: International Conference on Multi-Strategy Learning Environment. Springer Nature Singapore: Singapore; 2024. p. 99–112.
Chapter Google Scholar
Nield LE, Manlhiot C, Magor K, Freud L, Chinni B, Ims A, Ronzoni S. Machine learning to predict outcomes of fetal cardiac disease: a pilot study. Pediatr Cardiol. 2024;12(2):1–7.
Li Q, You T, Chen J, Zhang Y, Du C. LI-EMRSQL: linking Information enhanced Text2SQL parsing on complex electronic medical records. IEEE Trans Reliab. 2024;73(2):1280–90. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/TR.2023.3336330.
Article Google Scholar
Wang Q, Jiang Q, Yang Y, Pan J. The burden of travel for care and its influencing factors in China: an inpatient-based study of travel time. J Transp Health. 2022;25:101353. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jth.2022.101353.
Article Google Scholar

Download references

Acknowledgements

None.

Funding

Not applicable.

Author information

Authors and Affiliations

Prenatal Diagnosis Center, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China
Xuewen Yang & Ling Liu
School of Computer, Central China Normal University, Wuhan, 430079, China
Yan Wang

Authors

Xuewen Yang
View author publications
You can also search for this author inPubMed Google Scholar
Ling Liu
View author publications
You can also search for this author inPubMed Google Scholar
Yan Wang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Conceptualization, Y.W.Y; study design, figures, and tables, L.L; writing—original draft preparation, Y.W.Y, L.L, Y.W; writing—review and editing, Y.W.Y, L.L, Y.W; All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Xuewen Yang.

Ethics declarations

Ethics approval and consent to participate

This study complies with the Declaration of Helsinki. All participants provided written informed consent forms. All proposals have been approved by the Research Ethics Committee of Zhengzhou University.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, X., Liu, L. & Wang, Y. A Decision Tree-Driven IoT systems for improved pre-natal diagnostic accuracy. BMC Med Inform Decis Mak 24, 375 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12911-024-02759-x

Download citation

Received: 21 July 2024
Accepted: 11 November 2024
Published: 05 December 2024
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12911-024-02759-x

A Decision Tree-Driven IoT systems for improved pre-natal diagnostic accuracy

Abstract

Introduction

Literature review