A software tool for applying Bayes' theorem in medical diagnostics

Chatzimichail, Theodora; Hatjimihail, Aristides T.

doi:10.1186/s12911-024-02721-x

Software
Open access
Published: 21 December 2024

A software tool for applying Bayes' theorem in medical diagnostics

BMC Medical Informatics and Decision Making volume 24, Article number: 399 (2024) Cite this article

1550 Accesses
2 Altmetric
Metrics details

A Correction to this article was published on 28 January 2025

This article has been updated

Abstract

Background

In medical diagnostics, estimating post-test or posterior probabilities for disease, positive and negative predictive values, and their associated uncertainty is essential for patient care.

Objective

The aim of this work is to introduce a software tool developed in the Wolfram Language for the parametric estimation, visualization, and comparison of Bayesian diagnostic measures and their uncertainty.

Methods

This tool employs Bayes' theorem to estimate positive and negative predictive values and posterior probabilities for the presence and absence of a disease. It estimates their standard sampling, measurement, and combined uncertainty, as well as their confidence intervals, applying uncertainty propagation methods based on first-order Taylor series approximations. It employs normal, lognormal, and gamma distributions.

Results

The software generates plots and tables of the estimates to support clinical decision-making. An illustrative case study using fasting plasma glucose data from the National Health and Nutrition Examination Survey (NHANES) demonstrates its application in diagnosing diabetes mellitus. The results highlight the significant impact of measurement uncertainty on Bayesian diagnostic measures, particularly on positive predictive value and posterior probabilities.

Conclusion

The software tool enhances the estimation and facilitates the comparison of Bayesian diagnostic measures, which are critical for medical practice. It provides a framework for their uncertainty quantification and assists in understanding and applying Bayes' theorem in medical diagnostics.

Peer Review reports

Introduction

Medical diagnosis

Diagnosis in medicine is fundamentally the process of identifying a disease by analyzing its unique characteristics through abduction, deduction, and induction [1]. The term 'diagnosis', originating from the Greek 'διάγνωσις' meaning 'discernment' [2], underscores the critical role of distinguishing between healthy and diseased states in individuals. Diagnosis can be defined as the stochastic mapping of symptoms, signs, as well as laboratory and medical imaging findings onto a particular disease condition, based on medical knowledge.

Threshold based diagnosis

Diagnostic tests or procedures are often applied to classify individuals into diseased or nondiseased populations in a binary manner. Although the probability distributions of measurands from a quantitative diagnostic test in these populations may overlap, results are typically dichotomized by setting a diagnostic threshold or cut-off point [3]. Reliance on a single threshold for diagnosis across a spectrum of data points introduces uncertainty due to this overlap [4]. Nonetheless, this dichotomous approach represents a significant transformation in medical decision-making by correlating a continuous spectrum of evidence with binary clinical decisions, such as whether to treat or not [5].

Diagnostic accuracy measures

To ensure patient safety, the correctness of this classification must be rigorously evaluated. Although numerous diagnostic accuracy measures are described in the literature, only a few are routinely used in clinical research and practice to assess the diagnostic accuracy of threshold-based tests [6]. These include the prevalence-dependent positive and negative predictive values, defined conditionally on the test outcome.

Bayesian diagnosis

Bayes' theorem [7, 8] plays a pivotal role in medical diagnostics by transforming the pre-test or prior probability for a disease into a post-test or posterior probability after considering diagnostic test results [4, 7, 9,10,11,12]. This theorem connects the posterior probability P(H|E) of a hypothesis H being true given specific evidence E to the likelihood P(E|H) of observing the evidence E given that hypothesis H is true [13].

Bayesian inference

In purely Bayesian inference, the process begins with a prior distribution representing initial beliefs about the parameters of interest before observing any evidence. This prior distribution is then updated with the likelihood function—which represents the probability of the observed evidence given different parameter values—using Bayes' theorem to obtain the posterior distribution [10].

a) Prior distribution

The prior distribution embodies the beliefs held by researchers about parameters before seeing the evidence. Priors can be informative, weakly informative, or diffuse, depending on the level of certainty or uncertainty they reflect.

b) Likelihood function

The likelihood function describes the probability of the observed evidence given various parameter values. It is essential in updating the prior distribution to form the posterior distribution.

c) Posterior distribution

The posterior distribution results from combining the prior distribution and the likelihood function. It reflects the updated understanding of the parameters after considering the observed evidence.

d) Workflow

The typical Bayesian workflow involves:

a.
Specifying the prior distribution

Defining initial beliefs about the parameters based on prior knowledge or assumptions.
b.
Determining the likelihood function

Modeling how likely the observed data is, given different parameter values.
c.
Computing the posterior distribution

Applying Bayes' theorem to update the prior distribution with the likelihood function.
d.
Model checking and refinement

Assessing the model's fit and making necessary adjustments.
e.
Sensitivity analysis

Evaluating how sensitive the results are to changes in the prior assumptions or model specifications.

These steps ensure the robustness of Bayesian inference.

Empirical Bayesian methods

The empirical Bayesian approach simplifies the purely Bayesian framework by using available data to estimate the prior distribution, making it practical when prior information is sparse or unavailable [14, 15]. Instead of specifying a fixed prior distribution, the empirical Bayesian method treats the prior as an unknown quantity to be estimated from this data. This approach is particularly suitable for medical diagnostics, where real-time data integration is crucial.

The typical empirical Bayesian workflow involves:

a)
Data collection and preliminary analysis

Gathering a large dataset and performing statistical analyses to understand the distributions and characteristics of available data.
b)
Estimating prior distributions

Using empirical data to estimate prior distributions and probabilities through methods such as maximum likelihood estimation.
c)
Applying Bayes' theorem

Computing posterior probabilities by combining the estimated prior distributions and probabilities with the likelihood function, thereby incorporating the observed data.

This method allows for adaptive updating of beliefs based on available data, enhancing the applicability of Bayesian methods in practical settings where prior information may be limited.

Uncertainty

Uncertainty reflects imperfect or incomplete information. When quantifiable, it can be expressed using probability [16]. In our empirical Bayesian approach, we integrate frequentist methods for uncertainty quantification due to their established reliability and ease of implementation in clinical settings [17].

Measurement uncertainty

Due to the intrinsic variability of measurements, measurement uncertainty is defined as a 'parameter associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand'. This measurement uncertainty concept supplants the traditional notion of total analytical error [18].

Sampling uncertainty

Diagnostic measures are derived from screening or diagnostic tests applied to population samples. The variability within these samples contributes to the overall uncertainty of the measures [19]. This intrinsic heterogeneity is present even when simple random sampling techniques are employed [20].

Uncertainty of diagnostic accuracy measures and Bayesian posterior probabilities

Previous studies have explored the uncertainty associated with diagnostic accuracy measures and the posterior probabilities for disease derived from Bayes' theorem, demonstrating that this uncertainty can significantly impact their clinical usefulness [21, 22]. Estimating, evaluating, and mitigating this uncertainty are critical tasks in medical diagnosis.

Bayesian diagnostic measures

This project introduces a novel software tool designed for the parametric estimation and visualization of four diagnostic measures derived from Bayes' theorem, along with their associated uncertainty:

a) Positive predictive value and negative predictive value [11].

b) Posterior probability for disease and its complement, posterior probability for the absence of disease.

To the best of our knowledge, this is the first publication that compares these four Bayesian diagnostic measures mentioned above and their associated uncertainty.

Methods

Calculations

Calculation of Bayesian diagnostic measures

Bayes' theorem relates the probability $P\left(H|E\right)$ of a hypothesis $H$ being true given observed evidence $E$ to the inverse probability $P\left(E|H\right)$ of observing $E$ given that $H$ is true. It is expressed as:

$$P\left(\left.H\right|E\right)=\frac{P\left(\left.E\right|H\right)P\left(H\right)}{P\left(E\right)}=\frac{P\left(\left.E\right|H\right)P\left(H\right)}{P\left(\left.E\right|H\right)P\left(H\right)+P\left(\left.E\right|\overline H\right)P\left(\overline H\right)}$$

where $\overline{H }$ represents the negation of hypothesis $H$. Substituting back into Bayes' theorem:

$$P\left(H|E\right)=\frac{P\left(E|H\right)P\left(H\right)}{P\left(E|H\right)P\left(H\right)+P\left(E|\overline{H }\right)\left(1-P\left(H\right)\right)}$$

In medical diagnostics, Bayes' theorem provides a robust framework for updating the probability of a disease (hypothesis $H$) being present given new evidence $E$ (such as test results). By combining prior knowledge (pre-test probability) with new data (test results), Bayesian methods offer a comprehensive approach to the medical diagnostic process.

Positive and negative predictive value

Let $D$ denote the presence and $\overline{D }$ the absence of a disease, ${F}_{D}\left(x|{\varvec{\theta}}\right)$ the cumulative distribution function (CDF) of the test measurements $T$ in individuals with the disease, ${F}_{\overline{D} }\left(x|{\varvec{\theta}}\right)$ the CDF in individuals without the disease, and $v$ the prevalence or prior probability for disease. The positive predictive value of a diagnostic test $T$ for a diagnostic threshold $t$ is calculated as:

$$P\left(D|T\ge t\right)=\frac{\left(1-{F}_{D}\left(t|{\varvec{\theta}}\right)\right)v}{\left(1-{F}_{D}\left(t|{\varvec{\theta}}\right)\right)v+\left(1-{F}_{\overline{D} }\left(t|{\varvec{\theta}}\right)\right)\left(1-v\right)}$$

Similarly, the negative predictive value is:

$$P\left(\overline{D }|T<t\right)=\frac{{F}_{\overline{D} }\left(t|{\varvec{\theta}}\right)\left(1-v\right)}{{\left(1-{F}_{\overline{D} }\left(t|{\varvec{\theta}}\right)\right)\left(1-v\right)+F}_{D}\left(t|{\varvec{\theta}}\right)v}$$

In these equations, $1-{F}_{D}\left(t|{\varvec{\theta}}\right)$ represents the sensitivity of the test at threshold $t$ and ${F}_{\overline{D} }\left(t|{\varvec{\theta}}\right)$ its specificity.

These measures assess the test's ability to correctly identify diseased and nondiseased individuals based on the threshold $t$.

Posterior probability for disease and the absence of disease

Let ${f}_{D}\left(x|{\varvec{\theta}}\right)$ denote the probability density function (PDF) of the test measurements $T$ in individuals with the disease, ${f}_{\overline{D} }\left(x;{\varvec{\theta}}\right)$ the PDF in individuals without the disease, and $v$ the prevalence or prior probability for disease. The posterior or post-test probability for disease given a diagnostic test result $T=t$ is:

$$P\left(D|T=t\right)=\frac{{f}_{D}\left(t|{\varvec{\theta}}\right)v}{{f}_{D}\left(t|{\varvec{\theta}}\right)v+{f}_{\overline{D} }\left(t|{\varvec{\theta}}\right)\left(1-v\right)}$$

Similarly, the posterior or post-test probability for the absence of disease is:

$$\begin{aligned}P\left(\overline{D }|T=t\right)&=\frac{{f}_{\overline{D} }\left(t|{\varvec{\theta}}\right)\left(1-v\right)}{{f}_{\overline{D} }\left(t|{\varvec{\theta}}\right)\left(1-v\right)+{f}_{D}\left(t|{\varvec{\theta}}\right)v}\\&=1-P\left(D|T=t\right)\end{aligned}$$

These posterior probabilities provide a continuous assessment of disease likelihood based on the test measurement $t$, rather than dichotomizing the results using a threshold.

Uncertainty quantification

Uncertainty in input parameters can be represented as standard uncertainty $u(t)$, which is the standard deviation of $t$, and expanded uncertainty $U(t)$, which defines a range around $t$ with a specified probability $p$ [23].

Measurement uncertainty

Measurement uncertainty is estimated according to "Guide to the Expression of Uncertainty in Measurement" (GUM) [24] and "Expression of Measurement Uncertainty in Laboratory Medicine" [23]. Bias is considered a component of this uncertainty [25]. The relation between the standard measurement uncertainty ${u}_{m}\left(t\right)$, and the value of the measurement $t$, is typically represented as [20]:

$${u}_{m}\left(t\right)=\sqrt{{b}_{0}^{2}+{b}_{1}^{2}{t}^{2}}$$

where ${b}_{0}$ and ${b}_{1}$ are constants.

For a linear approximation, it is expressed as [20]:

$${u}_{m}\left(t\right)\cong {b}_{0}+{b}_{1}t$$

Sampling uncertainty of means and standard deviations

Standard uncertainty of means and standard deviations is estimated utilizing the central limit theorem and the chi-square distribution [26,27,28] as:

$${u}_{s}({m}_{P})\cong \frac{{s}_{P}}{\sqrt{{n}_{P}}}$$

$${u}_{s}({s}_{P})\cong \frac{{s}_{p}}{\sqrt{2\left({n}_{P}-1\right)}}$$

where ${m}_{P}$ and ${s}_{P}$ are the mean and standard deviation of measurements in a population sample of size ${n}_{P}$.

Sampling uncertainty of prevalence or prior probability for disease

Given the numbers ${n}_{D}$ and ${n}_{\overline{D} }$ of diseased and nondiseased individuals in a population sample, the standard uncertainty of the prevalence or prior probability for disease $v=\frac{{n}_{D}}{{{n}_{\overline{D} }+n}_{D}}$ is approximated as:

$${u}_{s}(v)\cong \sqrt{\frac{{{(2+n}_{\overline{D} })(2+n}_{D})}{{\left({{4+n}_{\overline{D} }+n}_{D}\right)}^{3}}}$$

using the Agresti–Coull adjustment of the Waldo interval [29].

Measures combined uncertainty

When there are $l$ independent and uncorrelated components of uncertainty, each with standard uncertainty ${u}_{i}(t)$, then their standard combined uncertainty $_{l}{u}_{c}(t)$ is calculated as [23]:

$$_{l}{u}_{c}(t)=\sqrt{\sum\limits_{i=1}^{l}{{(u}_{i}(t))}^{2}}$$

If the components are correlated, then [24]:

$$_{l}{u}_{c}(t)=\sqrt{\sum_{i=1}^{l}\sum_{j=1}^{l}{u}_{i}(t){u}_{j}{\left(t\right)\rho }_{ij}(t)}$$

where ${\rho }_{ij}(t)$ is the correlation coefficient between the uncertainties ${u}_{i}(t)$ and ${u}_{j}(t)$.

The standard combined uncertainty of the Bayesian diagnostic measures are computed via uncertainty propagation rules, employing a first-order Taylor series approximation [30] (refer to Supplemental File II: BayesianDiagnosticInsightsCalculations.nb). Assuming uncorrelated parameters, we use the following formula to compute uncertainty propagation [24]:

$$_{l}{u}_{c}(t)=\sqrt{\sum_{i=1}^{l}{\left(\frac{\partial g(t|{\varvec{\uptheta}})}{\partial {x}_{i}}\right)}^{2}{({u}_{i}(t) )}^{2}}$$

where $g(t|{\varvec{\uptheta}})$ is a Bayesian diagnostic measure with a parameter vector ${\varvec{\uptheta}}=({x}_{1},{x}_{2},\dots ,{x}_{l})$, $_{l}{u}_{c}(t)$ is the standard combined uncertainty of $g(t|{\varvec{\uptheta}})$, and ${u}_{i}(t)$ is the standard uncertainty of ${x}_{i}$ at $t$.

The estimated standard uncertainty of the Bayesian diagnostic measures is truncated to the $[0,1]$ range.

Measures expanded uncertainty

The effective degrees of freedom ${_{l}\nu }_{eff}(t)$ for the combined standard uncertainty $_{l}{u}_{c}(t)$ with $l$ components ${u}_{i}(t)$ with ${\nu }_{i}$ degrees of freedom each are determined using the Welch–Satterthwaite formula [31, 32]:

$${_{l}\nu }_{eff}(t)\cong \frac{{\left(_{l}{u}_{c}(t)\right)}^{4}}{\sum_{i=1}^{l}\frac{{\left({u}_{i}(t)\right)}^{4}}{{\nu }_{i}}}$$

It can be shown that if ${\nu }_{min}$ the minimum of ${\nu }_{1},{\nu }_{2},\dots ,{\nu }_{l}$, then :

$${\nu }_{min}\le {_{l}\nu }_{eff}(t)\le \sum_{i=1}^{l}{\nu }_{i}$$

The expanded combined uncertainty ${U}_{c}(t)$ at a confidence level $p$ is estimated as:

$${U}_{c}(t)\cong \left({F}_{\nu }^{-1}\left(\frac{1-p}{2}\right)_{l}{u}_{c}(t),{F}_{\nu }^{-1}\left(\frac{1+p}{2}\right)_{l}{u}_{c}(t)\right)$$

where ${F}_{\nu }^{-1}\left(z\right)$ is the inverse CDF of the Student's t-distribution with $\nu={_{l}\nu }_{eff}(t)$ degrees of freedom and $_{l}{u}_{c}(t)$ is the standard combined uncertainty of the Bayesian diagnostic measure.

Consequently, the confidence interval of $t$ at the same confidence level $p$ is approximated as:

$${CI}_{p}\left(t\right)\cong \left(x+ {F}_{\nu }^{-1}\left(\frac{1-p}{2}\right)_{l}{u}_{c}(t),x+{F}_{\nu }^{-1}\left(\frac{1+p}{2}\right)_{l}{u}_{c}(t)\right)$$

The estimated confidence intervals of the Bayesian diagnostic measures are truncated to the $[0,1]$ range.

The software

Program overview

The software program Bayesian Diagnostic Insights was developed using the Wolfram Language with Wolfram Mathematica® Ver 14.1 (Wolfram Research, Inc., Champaign, IL, USA). It facilitates the estimation and comparison of Bayesian diagnostic measures. This interactive program is designed to estimate and plot the values, standard sampling uncertainty, measurement uncertainty, combined uncertainty, and confidence intervals of Bayesian diagnostic measures for a screening or diagnostic test (refer to Figs. 1 and 2).

The program is freely accessible as a Wolfram Language notebook (.nb) (Supplemental File I: BayesianDiagnosticInsights.nb). It can be executed using Wolfram Player® or Wolfram Mathematica® (refer to Appendix A.3). The intricate nature of the required computations necessitates substantial computational resources.

Input parameters

Parametric distributions

Users can select the distributions of the measurements for diseased and nondiseased populations from a predefined list of univariate parametric distributions:

a)
Normal distribution
b)
Lognormal distribution
c)
Gamma distribution.

Bayesian diagnostic measures

Users select the Bayesian diagnostic measures to be evaluated from the following options:

a)
The positive predictive value $P\left(D|T\ge t\right)$
b)
The negative predictive value $P\left(\overline{D }|T<t\right)$
c)
The posterior probability for disease $P\left(D|T=t\right)$
d)
The posterior probability for the absence of disease $P\left(\overline{D }|T=t\right)$

Definition of populations and samples parameters and statistics

For each population, users define the mean $\mu$ and the standard deviation $\sigma$ of the measurements (in arbitrary units), along with the prior probability or prevalence $v$ of disease.

For each population sample, users define its size ${n}$, the mean ${m}$, and the standard deviation ${s}$ of the measurements (in arbitrary units).

Measurement uncertainty

Users select a linear or a nonlinear equation to describe the measurement uncertainty as a function of the measurement value $t$. They define the constant contribution ${b}_{0}$ to the standard measurement uncertainty, the proportionality constant ${b}_{1}$, and the number ${n}_{U}$ of quality control (QC) samples analyzed for its estimation.

For more details about the program's input, please refer to Appendix A2.

Output

The program generates plots and tables detailing the diagnostic measures, including their standard sampling uncertainty, measurement uncertainty, combined uncertainty, and associated confidence intervals. By providing this extensive array of input parameters, output plots, and tables, the program offers a platform for exploring and comparing Bayesian diagnostic measures and their uncertainty using univariate parametric distributions of medical diagnostic measurands.

More detailed documentation of the program's interface is provided in Supplemental file III: BayesianDiagnosticInsightsInterface.pdf

Illustrative case study

As previously described [22], we conducted an illustrative case study to demonstrate the program's application. We used fasting plasma glucose (FPG) as the diagnostic test measurand for the Bayesian diagnosis of diabetes mellitus (hereafter referred to as "diabetes"), with the oral glucose tolerance test (OGTT) serving as the reference method. Diabetes was diagnosed if the plasma glucose value was equal to or greater than $200\,\text{mg}/\text{dL}$, measured two hours after the oral administration of $75\,\text{g}$ of glucose during an OGTT (2-hour PG) [33]. The study focused on individuals aged 70 to 80 years, reflecting the significant correlation between age and diabetes prevalence [34].

Data was collected from the National Health and Nutrition Examination Survey (NHANES) participants from 2005 to 2016 (n = 60,936), as previously described [22]. NHANES is a comprehensive survey assessing the health and nutritional status of adults and children in the United States [35].

The inclusion criteria were valid FPG and OGTT results (n = 13,836), no prior diabetes diagnosis [36] (n = 13,465), and age 70–80 years (n = 976).

Participants with a 2-h PG measurement $\geq200\,\text{mg}/\text{dl}$ were classified as diabetic (n = 154).

The prevalence or prior probability for diabetes, along with the probability distributions for fasting plasma glucose (FPG) in both diabetic and nondiabetic participants, were estimated using empirical Bayes' methods [37]. We estimated the prevalence or prior probability for diabetes as follows:

$$v\cong \frac{154}{976}=0.158$$

The FPG datasets statistics are presented in Table 1 (hereafter, FPG and its uncertainty are expressed

Table 1 Descriptive statistics of the datasets and the estimated lognormal distributions of the diabetic and nondiabetic participants

Full size table

in $\text{mg}/\text{dl}$).

Lognormal distributions were employed to model FPG measurements in diabetic and nondiabetic participants using the maximum likelihood estimation method [38]. Parametrized for their means ${m}_{D}$ and ${m}_{\overline{D} }$, and standard deviations ${s}_{D}$ and ${s}_{\overline{D} }$, were defined as:

$${L}_{D}=Lognormal\left({m}_{D},{s}_{D}\right)=Lognormal\left(\text{120.671,17.791}\right)$$

$${L}_{\overline{D} }=Lognormal\left({m}_{\overline{D} },{s}_{\overline{D} }\right)=Lognormal(\text{102.642,10.747})$$

QC data for FPG measurements from NHANES for the same period (2005–2016) included 1350 QC samples. Nonlinear least squares regression [39, 40] applied to the QC data provided the following function for standard measurement uncertainty ${u}_{m}\left(t\right)$ relative to the measurement value $t$:

$${u}_{m}\left(t\right)=\sqrt{{b}_{0}^{2}+{b}_{1}^{2}{t}^{2}}=\sqrt{0.6600+0.00014{t}^{2}}$$

where ${b}_{0}=0.8124$ and ${b}_{1}=0.0119$.

We estimated the means of the standard measurement uncertainty of FPG in the diabetic and nondiabetic participants as follows:

$$\begin{array}{c}{\widehat{u}}_{D}\cong 1.665\,\text{mg}/\text{dl}\\ {\widehat{u}}_{\overline{D} }\cong 1.473\,\text{mg}/\text{dl}\end{array}$$

Consequently, we estimated the distributions of the measurements, assuming negligible measurement uncertainty, as:

$$\begin{array}{c}{d}_{D}\cong Lognormal\left({m}_{D},\sqrt{{s}_{D}^{2}-{\widehat{u}}_{D}^{2}}\right)\cong Lognormal\left(\text{120.671,17.713}\right)\\ {d}_{\overline{D} }\cong Lognormal\left({m}_{\overline{D} },\sqrt{{s}_{\overline{D} }^{2}-{\widehat{u}}_{\overline{D} }^{2}}\right)\cong Lognormal(\text{102.642,10}.747)\end{array}$$

Table 1 presents the descriptive statistics of the estimated lognormal distributions for diabetic and nondiabetic participants and the respective p-values from the Cramér–von Mises goodness-of-fit test [41]. This test assesses the goodness-of-fit by comparing the empirical CDFs of the measurement samples with those of the estimated distributions. The calculated p-values indicate that any observed differences between the empirical data and the estimated distributions can be attributed to random sampling variability, suggesting that the lognormal distributions provide an acceptable fit to the FPG measurements in both groups.

Figures 3 and 4 show the estimated PDFs of FPG in the diabetic and nondiabetic participants, assuming a lognormal distribution and negligible measurement uncertainty, along with the histograms of the respective NHANES datasets.

Likelihoods and posterior probabilities were estimated accordingly.

Results

The results of applying the program to the illustrative case study data are presented in Figs. 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, and the program settings are detailed in Tables 2 and 3.

Table 2 The settings of the program Bayesian Diagnostic Insights for Figs. 5, 6, 7, 8 and 9

Full size table

Table 3 The settings of the program Bayesian Diagnostic Insights for Figs. 10, 11, 12, 13, 14, 15, 16, 17, 18 and 19

Full size table

Measures

Figure 5 displays the plots of:

a)
Positive predictive value $P\left(D|T\ge t\right)$ of FPG for diabetes versus threshold value $t$ (mg/dl) (orange curve). The curve is smooth, increasing monotonically, and approximately sigmoidal. $P\left(D|T\ge t\right)$ is asymptotically equal to the prevalence of diabetes for lower values of $t$, then rises rapidly to approach an asymptote at $1.00$.
b)
Posterior probability for diabetes versus FPG value $t$ $(\text{mg}/\text{dl})$ (blue curve). The curve is smooth, approximately double sigmoidal. For $t=86.7\,\text{mg}/\text{dl},\,P\left(D|T=t\right)$ has a minimum value of $0.04$. $P\left(D|T=t\right)$, is asymptotically equal to 1.00 for very low and very high values of $t$, decreasing rapidly to its minimum before increasing rapidly again.

Figure 6 presents the plots of:

a)
The negative predictive value $P\left(\overline{D }|T<t\right)$ of FPG for diabetes versus threshold value $t$ $(\text{mg}/\text{dl})$ (orange curve). The curve is smooth and unimodal, with a maximum value of $0.96$ at $t=91.0\,\text{mg}/\text{dl}$. $P\left(\overline{D }|T<t\right)$ is asymptotically equal to $0.00$ for lower values of $t$, then rises rapidly to its maximum and becomes asymptotically equal to $1.00-v$, where $v$ the prevalence of diabetes.
b)
The posterior probability $P\left(\overline{D }|T=t\right)$ for the absence of diabetes versus FPG value $t$ $(\text{mg}/\text{dl})$ (blue curve). The curve is smooth, unimodal, and approximately double sigmoidal. For an FPG value $t=86.7\,\text{mg}/\text{dl}$, $P\left(\overline{D }|T=t\right)$ has a maximum value of $0.96$. $P\left(\overline{D }|T=t\right)$ is asymptotically equal to $0.00$ for lower and higher values of $t$.

Additionally:

a)
For $t=67.0\,\text{mg}/\text{dl}$, we have $P\left(D|T\ge t\right)=P\left(D|T=t\right)=0.158=v$
b)
For $t<67.0\,\text{mg}/\text{dl}$, we have $P\left(D|T\ge t\right)<P\left(D|T=t\right)$,
c)
For $t>67.0\,\text{mg}/\text{dl}$, we have $P\left(D|T\ge t\right)>P\left(D|T=t\right)$.
d)
For $t=91.0\,\text{mg}/\text{dl}$, we have $P\left(\overline{D }|T<t\right)=P\left(\overline{D }|T=t\right)=0.96$.
e)
For $t<91.0\,\text{mg}/\text{dl}$, we have $P\left(\overline{D }|T<t\right)<P\left(\overline{D }|T=t\right)$
f)
For $t>91.0\,\text{mg}/\text{dl}$, we have $P\left(\overline{D }|T<t\right)>P\left(\overline{D }|T=t\right)$.

As shown in Figs. 7 and 8, for an FPG value $t=126.0\,\text{mg}/\text{dl}$ and varying prevalence $0.0<v<1.0$:

a)
Both $P\left(D|T\ge t\right)$ and $P\left(D|T=t\right)$ curves are smooth, starting from a probability asymptotically equal to $0.00$, monotonically increasing as prevalence increases.
b)
Both $P\left(\overline{D }|T<t\right)$ and $P\left(\overline{D }|T=t\right)$ curves are smooth, starting from a probability asymptotically equal to $1.00$, monotonically decreasing as prevalence increases.
c)
It is observed that $P\left(D|T\ge t\right)>P\left(D|T=t\right)$ and $P\left(\overline{D }|T<t\right)>P\left(\overline{D }|T=t\right)$.

Figure 9 shows a table of the Bayesian diagnostic measures for an FPG value $t=126\,\text{mg}/\text{dl}$, the established threshold for the diagnosis of diabetes [42], assuming normal, lognormal, and gamma distributions of FPG.

Uncertainty

Figure 10 shows the plots of:

a)
The standard sampling, measurement, and combined uncertainty of the positive predictive value for diabetes versus FPG value $t$ $(\text{mg}/\text{dl})$. The curves are smooth and unimodal.
b)
The standard sampling, measurement, and combined uncertainty of the posterior probability for diabetes versus FPG value $t$ $(\text{mg}/\text{dl})$. The curves are smooth and bimodal.

Figure 11 shows the plots of:

a)
The standard sampling, measurement, and combined uncertainty of the negative predictive value for diabetes versus FPG value $t$ $(\text{mg}/\text{dl})$. The curves are smooth and unimodal.
b)
The standard sampling, measurement, and combined uncertainty of the posterior probability for the absence of diabetes versus FPG value $t$ $(\text{mg}/\text{dl})$. The curves are smooth and bimodal.

In the assessment of the combined standard uncertainty of posterior probability for diabetes ${u}_{c}[P\left(D|T=t\right)]$ and for the absence of diabetes ${u}_{c}\left[P\left(\overline{D }|T=t\right)\right]$:

a)
They are equal.
b)
They are substantially affected by the measurement uncertainty of FPG.
c)
Two local maxima are observed, corresponding to the regions near the steepest segments of the posterior probability curves, which exhibit an approximately double sigmoidal configuration. The maxima are quantitatively approximated as follows:
1. a.
  At an FPG value of $t=58.5\,\text{mg}/\text{dl}$, the combined standard uncertainty is $0.898\,\text{mg}/\text{dl}$, where $P\left(D|T=t\right)=0.581$ and $P\left(\overline{D }|T=t\right)=0.419$.
2. b.
  At an FPG value of $t=133.1\,\text{mg}/\text{dl}$, the combined standard uncertainty is $0.190\,\text{mg}/\text{dl}$, where $P\left(D|T=t\right)=0.726$ and $P\left(\overline{D }|T=t\right)=0.274$.
3. c.
  The standard combined uncertainty ${u}_{c}[P\left(D|T\ge t\right)]$ of the positive predictive value for diabetes of FPG has a maximum value of $0.150\,\text{mg}/\text{dl}$ for $t=126.0\,\text{mg}/\text{dl}$, where $P\left(D|T\ge t\right)=0.758$.
4. d.
  The standard combined uncertainty ${u}_{c}[P\left(\overline{D }|T<t\right)]$ of the negative predictive value for diabetes has a maximum value of $0.900\,\text{mg}/\text{dl}$ for $t=58.5\,\text{mg}/\text{dl}$, where $P\left(\overline{D }|T<t\right)=0.321$.
5. e.
  This pattern indicates heightened uncertainty in the regions where the diagnostic measures curves have their most pronounced inflections (Figs. 5 and 6).

In addition:

a)
For $t=95.7\,\text{mg}/\text{dl},$ we have ${u}_{c}\left[P\left(D|T\ge t\right)\right]={u}_{c}\left[P\left(D|T=t\right)\right]=0.013\,\text{mg}/\text{dl}$, while $P\left(D|T\ge t\right)=0.193$ and $P\left(D|T=t\right)=0.049$.
b)
For $t=126.7\,\text{mg}/\text{dl},$ we have ${u}_{c}\left[P\left(D|T\ge t\right)\right]={u}_{c}\left[P\left(D|T=t\right)\right]=0.149\,\text{mg}/\text{dl}$, while $P\left(D|T\ge t\right)=0.774$ and $P\left(D|T=t\right)=0.517$.
c)
For $0<t<95.7\,\text{mg}/\text{dl}$ and $t>126.7\,\text{mg}/\text{dl}$ we have ${u}_{c}\left[P\left(D|T\ge t\right)\right]<{u}_{c}\left[P\left(D|T=t\right)\right]$.
d)
For $95.7\,\text{mg}/\text{dl }<t<126.7\,\text{ mg}/\text{dl}$ we have ${u}_{c}\left[P\left(D|T\ge t\right)\right]>{u}_{c}\left[P\left(D|T=t\right)\right]$
e)
For $t=59.1\text{ mg}/\text{dl},$ we have ${u}_{c}\left[P\left(\overline{D }|T<t\right)\right]={u}_{c}\left[P\left(\overline{D }|T=t\right)\right]=0.887\,\text{mg}/\text{dl}$, while $P\left(\overline{D }|T<t\right)=0.362$ and $P\left(\overline{D }|T=t\right)=0.463$.
f)
For $t=103.8\text{ mg}/\text{dl},$ we have ${u}_{c}\left[P\left(\overline{D }|T<t\right)\right]={u}_{c}\left[P\left(\overline{D }|T=t\right)\right]=0.015\,\,\text{mg}/\text{dl}$, while $P\left(\overline{D }|T<t\right)=0.947$ and $P\left(\overline{D }|T=t\right)=0.921$.
g)
For $0<t<59.1\text{ mg}/\text{dl}$ and $t>103.8\,\text{mg}/\text{dl}$ we have ${u}_{c}\left[P\left(\overline{D }|T<t\right)\right]<{u}_{c}\left[P\left(\overline{D }|T=t\right)\right]$.
h)
For $59.1\text{ mg}/\text{dl}<t<103.8\text{ mg}/\text{dl}$ we have ${u}_{c}\left[P\left(\overline{D }|T<t\right)\right]>{u}_{c}\left[P\left(\overline{D }|T=t\right)\right].$

The confidence intervals are affected accordingly (refer to Figs. 12 and 13):

a)
The confidence intervals of Bayesian posterior probability $P\left(D|T=t\right)$ for diabetes (blue curves) are narrower for both lower and higher values of $t$.
b)
The confidence intervals of positive predictive value $P\left(D|T\ge t\right)$ (orange curves) narrow considerably for lower values of $t$.
c)
The confidence intervals of Bayesian posterior probability $P\left(\overline{D }|T=t\right)$ for the absence of diabetes (blue curves) are wider at the extremes of the $t$ spectrum.
d)
The confidence intervals of negative predictive value $P\left(\overline{D }|T<t\right)$ (orange curves) are wide at lower $t$ values, to become considerably narrower at higher values.

For an FPG value $t=126\,\text{mg}/\text{dl},$ Figs. 14 and 15 show the plots of the standard sampling, measurement, and combined uncertainty of positive predictive value, the posterior probability for diabetes, the negative predictive value, and the posterior probability for the absence of diabetes versus prior probability or prevalence of diabetes $v$. The combined uncertainty of the diagnostic measures is substantially affected by the measurement uncertainty of FPG. The curves are unimodal, with maxima approximately:

a)
For $v=0.055,\,{u}_{c}\left[P\left(D|T\ge t\right)\right]=0.205\,\text{mg}/\text{dl}$ where $P\left(D|T\ge t\right)=0.493$.
b)
For $v=0.158,\,{u}_{c}\left[P\left(D|T=t\right)\right]=0.141\,\text{mg}/\text{dl}$ where $P\left(D|T=t\right)=0.494.$
c)
For $v=0.631,\,{u}_{c}\left[P\left(\overline{D }|T<t\right)\right]=0.023\,\text{mg}/\text{dl}$ where $P\left(\overline{D }|T<t\right)=0.471$.
d)
For $v=0.158,\,{u}_{c}\left[P\left(\overline{D }|T=t\right)\right]=0.141\,\text{mg}/\text{dl}$ where $P\left(\overline{D }|T=t\right)=0.506$.

The local maxima indicate heightened uncertainty in regions where the diagnostic measures curves have their most pronounced inflections (refer to Figs. 7 and 8).

Additionally:

a)
For $v=0.173$ we have ${u}_{c}\left[P\left(D|T\ge t\right)\right]={u}_{c}\left[P\left(D|T=t\right)\right]=0.141\,\text{mg}/\text{dl}$, $P\left(D|T\ge t\right)=0.777\,\text{mg}/\text{dl}$ and $P\left(D|T=t\right)=0.521$.
b)
For $0<v<0.173$ we have ${u}_{c}\left[P\left(D|T\ge t\right)\right]>{u}_{c}\left[P\left(D|T=t\right)\right]$.
c)
For $0.173 <v<1.0$ we have ${u}_{c}\left[P\left(D|T\ge t\right)\right]<{u}_{c}\left[P\left(D|T=t\right)\right]$.
d)
For $0<v<1.0$ we have ${u}_{c}\left[P\left(\overline{D }|T<t\right)\right]<{u}_{c}\left[P\left(\overline{D }|T=t\right)\right]$.

Notably, the combined uncertainty of the negative predictive value is considerably less than the combined uncertainty of the posterior probability for the absence of diabetes.

The confidence intervals are adjusted accordingly (refer to Figs. 16 and 17):

a)
The confidence intervals of Bayesian posterior probability $P\left(D|T=t\right)$ for diabetes (Fig. 16, blue curves), positive predictive value $P\left(D|T\ge t\right)$ (Fig. 16, orange curves), Bayesian posterior probability $P\left(\overline{D }|T=t\right)$ for the absence of diabetes (Fig. 17, blue curves) and negative predictive value $P\left(\overline{D }|T<t\right)$ (Fig. 17, orange curves) are narrowest at both lower and higher prevalences.
b)
The confidence intervals of $P\left(D|T\ge t\right)$ (Fig. 16, orange curves) are generally narrower than those of $P\left(D|T=t\right)$ (Fig. 16, blue curves).
c)
The confidence intervals of $P\left(\overline{D }|T<t\right)$ (Fig. 17, orange curves) are considerably narrower than those of $P\left(\overline{D }|T=t\right)$ (Fig. 17, blue curves).

Figures 18 and 19 present tables of Bayesian diagnostic measures for FPG measurements at the diabetes diagnostic threshold $t=126\,\text{mg}/\text{dl}$, following the American Diabetes Association (ADA) guidelines. The standard for diagnosing diabetes used in this study is OGTT with a $200\,\text{mg}/\text{dl}$ threshold. The limited concordance between these two diagnostic thresholds is evident from the point estimations and their associated uncertainty. For an FPG diagnostic threshold $t=126\,\text{mg}/\text{dl}$:

a)
$P\left(D|T\ge t\right)=0.758$, with a confidence interval of $(0.465 - 1.000)$.
b)
$P\left(D|T=t\right)=0.494$, with a confidence interval of $(0.217 - 0.770)$.
c)
$P\left(\overline{D }|T<t\right)=0.890$, with a confidence interval of $(0.868 - 0.912)$.
d)
$P\left(\overline{D }|T=t\right)=0.506$, with a confidence interval of $(0.230 - 0.783)$.

Therefore:

a)
$P\left(D|T=t\right)<P\left(D|T\ge t\right)$
b)
The sizes of the confidence intervals of $P\left(D|T\ge t\right)$ and $P\left(D|T=t\right)$ are comparable.
c)
There is a considerable overlap between the confidence intervals of $P\left(D|T\ge t\right)$ and $P\left(D|T=t\right)$.
d)
$P\left(\overline{D }|T=t\right)<P\left(\overline{D }|T<t\right)$
e)
The size of the confidence interval of $P\left(\overline{D }|T<t\right)$ is considerably less than the size of the confidence interval of $P\left(\overline{D }|T=t\right)$.
f)
There is no overlap between the confidence intervals of $P\left(\overline{D }|T<t\right)$ and $P\left(\overline{D }|T=t\right)$.

In addition, the table with the standard uncertainty of the Bayesian diagnostic measures of Fig. 18 shows that for $t=126\,\text{mg}/\text{dl},$ measurement uncertainty is the main component of their combined uncertainty.

All the figures provided by the program about the illustrative case study data are presented in Supplemental file IV: BayesianDiagnosticInsightsFigures.pdf.

Discussion

There is a persistent need to estimate diagnostic measures and their uncertainty, especially concerning screening and diagnostic tests for potentially life-threatening diseases. The COVID-19 pandemic has highlighted this necessity [43,44,45,46,47,48].

Traditional diagnostic approaches often rely on fixed thresholds, which may overlook certain aspects of disease pathology. While historically influential, these methods may lack the comprehensive perspective required in modern patient-centered medicine. The continuous evolution of disease progression and changing patient demographics further complicate the diagnostic process, challenging the limits of traditional methods. In this context, Bayesian inference emerges as a viable alternative, offering probabilistic assessments tailored to individual patient profiles [4, 49]. Bayes' theorem provides a statistical framework to update the probability estimate of a disease as new information or test results become available, enabling healthcare professionals to refine disease probability estimates based on new data and prior knowledge.

We developed the software tool introduced in this study to facilitate the application of Bayes' theorem in medical diagnosis. It allows for the exploration and comparison of two pairs of Bayesian diagnostic measures for screening or diagnostic tests, assuming parametric distributions of the measurements:

a)
The positive predictive value and the posterior probability for disease and
b)
The negative predictive value and the posterior probability for the absence of disease.

Academic publications that thoroughly explore the statistical distributions of diagnostic test measurements in diseased and nondiseased populations are limited [50]. Therefore, exploratory data analysis and fitting of statistical distributions to diagnostic measurement data may be necessary to apply the software tool effectively [51]. Our previously developed Bayesian Diagnosis program may be helpful in this regard [4].

Our choice of parametric distributions was motivated by their broad applicability in modeling medical diagnostic measurements:

a) Normal distribution

A normal distribution is suited for data symmetric around the mean, indicating minimal skewness. This distribution assumes that data points are equally likely to occur on either side of the mean, forming the well-known bell curve.

b) Lognormal distribution

A lognormal distribution is appropriate for modeling positively skewed data, where the logarithm of the variable follows a normal distribution. Defined by a location parameter and a scale parameter of the underlying normal distribution, it can model data that cannot assume negative values and exhibit a long right tail, such as many biological measurements.

c) Gamma distribution

The gamma distribution is suitable for data with varying skewness and kurtosis that a lognormal distribution cannot adequately model. It is characterized by a shape parameter and a scale parameter. The flexibility of these parameters allows the gamma distribution to model a wide range of data behaviors, including varying degrees of skewness and kurtosis.

In our illustrative case study, we implemented an empirical Bayesian approach due to several advantages:

a) Adaptability

It can adapt to the specific characteristics of the dataset, making it more flexible and applicable to diverse clinical settings.

b) Robustness

Using empirical data to inform the prior mitigates the risk of bias introduced by subjective prior choices.

c) Computational efficiency

Estimating the prior from data reduces the computational burden compared to purely Bayesian methods that require specifying and integrating complex prior distributions.

Estimating the uncertainty inherent in diagnostic measures is a considerable challenge in medical diagnostics [21, 22, 52]. This challenge is particularly pronounced in medical decision-making for potentially life-threatening conditions. Assessing uncertainty is vital for ensuring reliable diagnoses and appropriate clinical interventions. Several notable examples of diagnostic measures where uncertainty estimation is critical include:

a) Cardiac troponin for diagnosing myocardial injury and infarction

Cardiac troponin is a crucial biomarker for diagnosing myocardial injury and infarction [53].

b) Natriuretic peptides for diagnosing heart failure

Natriuretic peptides, such as B-type natriuretic peptide (BNP) and N-terminal pro-b-type natriuretic peptide (NT-proBNP), are essential in diagnosing heart failure [54].

c) D-dimer for diagnosing thromboembolic events

The measurement of D-dimer levels plays a crucial role in diagnosing thromboembolic events, such as deep vein thrombosis and pulmonary embolism [55].

d) FPG, OGTT, and glycated hemoglobin (HbA1c) for diagnosing diabetes

Diagnosing diabetes relies on measuring blood glucose levels through tests like FPG, OGTT, and HbA1c [42].

e) OGTT for diagnosing gestational diabetes

OGTT is the standard diagnostic tool for gestational diabetes and is vital for the health of both the mother and the developing fetus [56].

f) Thyroid stimulating hormone (TSH), free serum triiodothyronine (T3), and free serum thyroxine (T4) for diagnosing thyroid dysfunction

Measurement of thyroid function tests, including TSH, free T3, and free T4, is essential for diagnosing thyroid dysfunction [57].

Our software allows the estimation and plotting of the sampling, measurement, and combined uncertainty of Bayesian diagnostic measures and their confidence intervals.

Confidence interval plots serve multiple purposes:

a) Precision assessment

They provide insights into the precision of probability estimates at different measurement levels [58].

b) Decision-making support

For clinical decision-making, these plots can highlight the measurement thresholds where the probability for disease shifts significantly, guiding interventions or further testing.

c) Epidemiological insights

In epidemiological studies, understanding how disease probability varies across a population's measurement spectrum helps identify risk factors and inform public health strategies.

Quantifying diagnostic uncertainty is imperative in laboratory medicine to define analytical performance specifications, manage quality and risk, and design and implement test accuracy studies [59]. However, despite extensive research on Bayesian diagnosis and uncertainty, their intersection remains relatively unexplored [60, 61].

The illustrative case study aimed to minimize age-related variations in disease prevalence by focusing on individuals aged 70 to 80 years. This focus demonstrates the considerations required in modern diagnostics, where factors such as age, genetics, and lifestyle choices must be accounted for in the diagnostic equation. This case study underscores the substantial impact of combined uncertainty on the diagnostic process, highlighting the predominant role of measurement uncertainty and the challenges in enhancing diagnostic accuracy. Improving the analytical methods of screening and diagnostic tests could enable the medical community to achieve more accurate diagnoses, facilitating more effective and personalized patient care.

A detailed analysis of Figs. 5, 6, 7, 8, 12, 13, 16, and 17 from the illustrative case study reveals several clinical implications:

a)
Influence of threshold and prevalence on positive predictive value

The positive predictive value $P\left(D|T\ge t\right)$ is highly influenced by the chosen threshold and the prevalence of diabetes, emphasizing the importance of selecting the appropriate cut-off for accurate diagnosis.
b)
Double-threshold pattern in posterior probability

The double-threshold pattern observed in the Bayesian posterior probability $P\left(D|T=t\right)$ for diabetes suggests the need to understand the pathological implications of different FPG levels for tailored diagnostic strategies.
c)
Variability in confidence intervals at intermediate FPG levels

The variability in confidence intervals of both $P\left(D|T\ge t\right)$ and $P\left(D|T=t\right)$ at intermediate FPG levels suggests an increased risk of false positives or false negatives. This variability could result in unnecessary treatments or missed diagnoses, highlighting the importance of carefully interpreting test results within this range.
d)
Significance of threshold selection for negative predictive value

The differing trends in negative predictive value $P\left(\overline{D }|T<t\right)$ highlight the significance of selecting the appropriate threshold for excluding diabetes.
e)
Unique behavior of posterior probability for absence of disease

The unique behavior of Bayesian posterior probability $P\left(\overline{D }|T=t\right)$ for the absence of diabetes at lower FPG values, and the variability in its confidence intervals at both lower and higher FPG values impact diagnostic decisions, necessitating careful interpretation.
f)
Robustness of negative predictive value

Despite the interpretative challenges of $P\left(\overline{D }|T<t\right)$ at lower FPG values, it is generally more robust than $P\left(\overline{D }|T=t\right)$ at higher FPG values.

The tables in Figs. 18 and 19:

a)
Indicate limited concordance between the diabetes classification criteria derived from the OGTT and FPG tests, consistent with findings previously reported in the literature [62, 63].
b)
Show that for FPG and diabetes, the point estimation of each Bayesian posterior probability is substantially less than the respective predictive value.

The discrepancies between FPG and OGTT thresholds for diagnosing diabetes highlight the need for a careful and comprehensive approach in clinical practice. By implementing combined testing strategies, repeat testing protocols, and informed clinical judgment, healthcare providers can improve diagnostic accuracy and patient outcomes. Further research and patient education are also necessary in addressing the challenges posed by the limited concordance between these diagnostic methods and their considerable uncertainty.

Our approach integrates frequentist methods for uncertainty quantification due to their established reliability and ease of implementation in clinical settings. This empirical Bayesian framework allows for the practical application of Bayes' theorem while leveraging the robustness of frequentist techniques for estimating sampling and measurement uncertainty.

Future research should focus on improving the estimations of the uncertainty of Bayesian diagnostic measures of different measurands under a diverse array of clinically and laboratory-relevant parameter settings. Furthermore, the full implementation of Bayesian methods for all aspects of uncertainty quantification could be explored, including utilizing Bayesian hierarchical models [7, 64]. Additionally, applying Bayes' factors to compare the evidence provided by different diagnostic measures represents a promising area for further investigation [65, 66]. These advancements could enhance the robustness and applicability of Bayesian methods in medical diagnostics, overcoming their current limitations [17, 67].

To transition from research to practical application, clinical decision analysis, cost-effectiveness studies, and research on risk assessment and quality of care, including implementing studies, are required [68]. These efforts are essential for addressing the complex issues in diagnostic medicine and developing new and effective strategies to overcome ongoing challenges.

All major general or medical statistical software packages (JASP^® ver. 0.19.1, Mathematica^® ver. 14.1, Matlab^® ver. R2024a, MedCalc^® ver. 23.0.2, metRology ver. 1.1-3, NCSS^® ver. 24.0.3, NIST Uncertainty Machine ver. 1.6.2, OpenBUGS ver. 3.2.3, R ver. 4.4.1, SAS Viya^® ver. 2024.09, SPSS^® ver. 30.0.0, Stan ver. 2.35, Stata^® ver. 19, and UQLab ver. 2.0) include routines for calculating and plotting various diagnostic measures and their confidence intervals. However, the program presented in this work provides 34 types of plots and 16 types of comprehensive tables of the four Bayesian diagnostic measures, their uncertainty, and the associated confidence intervals (Fig. 1), many of which are novel. To the best of our knowledge, neither the programs mentioned above, nor any other software offers this extensive range of plots and tables without requiring advanced statistical programming.

The program complements our previously published tools for exploring diagnostic measures and posterior probability for disease and their uncertainty [4, 21, 22, 69], facilitating their comparison.

Limitations of the program

This program's limitations, which provide paths for further research, include:

a)
Underlying assumptions
1. a.
  Existence of "gold standards" in diagnostics: The program assumes the availability of a "gold standard" for disease classification. Without a "gold standard", alternative approaches like latent class models or expert consensus methods may be necessary [70,71,72,73].
2. b.
  Assumption of specific distributions: The tool assumes that the measurements or their transformations follow normal, lognormal, or gamma distributions. While these distributions are often used in biomedical data, they may not accurately represent the underlying data characteristics. Literature on reference intervals, diagnostic thresholds, and clinical decision limits provides alternative distribution models that could be considered [74,75,76,77,78].
3. c.
  Assumption of bimodality: The program generally accepts that the measurements are bimodally distributed, corresponding to diseased and nondiseased populations. However, in some cases, an unimodal distribution might be more appropriate [79, 80].
b)
Approximations used for the estimations
1. a.
  Uncertainty approximation in disease prevalence: The uncertainty associated with a disease's prevalence is approximated using the Agresti–Coull-adjusted Wald interval. Although this method is widely used, more accurate techniques are available, especially for small sample sizes or extreme probabilities [81].
2. b.
  Sampling uncertainty approximations: The program approximations of the sampling uncertainty for sample means and standard deviations may be less reliable for small sample sizes or when the data exhibit significant skewness, as is often the case with lognormal and gamma distributions [82, 83].
3. c.
  First-order Taylor series approximations: The program employs first-order Taylor series approximations for uncertainty propagation. While this method simplifies calculations, it may not capture the complexity of uncertainty in nonlinear functions. Higher-order approximations or Monte Carlo simulations could provide more accurate results [24, 84].
4. d.
  Confidence intervals based on the t-distribution: Confidence intervals are derived using the t-distribution, which, despite the high relative uncertainty [85], is a practical choice in selected scenarios, particularly in metrology [7, 17, 67, 86]. Alternatives like credible intervals in a Bayesian framework could provide more accurate uncertainty quantification of nonlinear functions, especially for small samples.
5. e.
  Truncation to the $[0,1]$ range: Truncation of the estimated standard uncertainty and the confidence intervals to the $[0,1]$ range is implemented since probabilities cannot logically assume values less than zero or greater than one. However, this approach may distort the uncertainty representation. Quantile-derived credible intervals inherently avoid truncation by constructing intervals within the $[0,1]$ range.

While addressing these limitations would considerably increase computational complexity, they represent critical areas for future enhancement [84, 87]. We should, however, keep in mind that "all models will be based on assumptions and can only approach complex reality" [88], as "all models are wrong, but some models are useful" [89].

Limitations of the case study

The primary limitations of the case study are:

a)
Dependence on the OGTT as the reference method for diagnosing diabetes mellitus, despite various factors affecting glucose tolerance [90,91,92,93,94,95,96,97,98].
b)
Approximation of the FPG measurements distributions from NHANES datasets by lognormal distributions.
c)
The implied assumption of simple random sampling.

Conclusion

Bayesian Diagnostic Insights provides modules for estimating, visualizing, and comparing Bayesian diagnostic measures, including their associated uncertainty. Exploring the uncertainty of disease probability estimates can assist in the clinical decision-making process. The illustrative case study using FPG for diabetes diagnosis demonstrates the impact of measurement uncertainty on diagnostic measures, highlighting its relevance in clinical and laboratory practices. While the software offers a framework for applying Bayes' theorem in medical diagnostics, further research is needed to fully assess its utility in diagnosing various health conditions.

Data availability

The data presented in this study are available at https://www.n.cdc.gov/nchs/nhanes/default.aspx (accessed on 18 May 2024).

Change history

28 January 2025
A Correction to this paper has been published: https://doi.org/10.1186/s12911-025-02863-6

References

Stanley DE, Campos DG. The logic of medical diagnosis. Perspect Biol Med. 2013;56(2):300–15.
Article PubMed Google Scholar
Weiner ESC, Simpson JA, Oxford University Press. The Oxford English dictionary. Oxford, Oxford: Clarendon Press ; Melbourne; 1989 2004.
Zou KH, O’Malley AJ, Mauri L. Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation. 2007;115(5):654–7.
Article PubMed Google Scholar
Chatzimichail T, Hatjimihail AT. A Bayesian Inference Based Computational Tool for Parametric and Nonparametric Medical Diagnosis. Diagnostics. 2023;13(19):3135.
Article PubMed PubMed Central Google Scholar
Djulbegovic B, van den Ende J, Hamm RM, Mayrhofer T, Hozo I, Pauker SG, et al. When is rational to order a diagnostic test, or prescribe treatment: the threshold model as an explanation of practice variation. Eur J Clin Invest. 2015;45(5):485–93.
Article PubMed Google Scholar
Šimundić A-M. Measures of Diagnostic Accuracy: Basic Definitions. EJIFCC. 2009;19(4):203–11.
PubMed PubMed Central Google Scholar
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. CRC Press; 2013. 675 p.
Bayes M, Price M. LII. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, F. R. S. communicated by Mr. Price, in a letter to John Canton, A. M. F. R. S. Philos Trans R Soc Lond. 1763;5:370–418.
Viana MAG, Ramakrishnan V. Bayesian estimates of predictive value and related parameters of a diagnostic test. Can J Stat. 1992;20(3):311–21.
Article Google Scholar
van de Schoot R, Depaoli S, King R, Kramer B, Märtens K, Tadesse MG, et al. Bayesian statistics and modelling. Nature Reviews Methods Primers. 2021;1(1):1–26.
Article Google Scholar
Bours MJ. Bayes’ rule in diagnosis. J Clin Epidemiol. 2021;131:158–60.
Article PubMed Google Scholar
Fischer F. Using Bayes theorem to estimate positive and negative predictive values for continuously and ordinally scaled diagnostic tests. Int J Methods Psychiatr Res. 2021;30(2):e1868.
Article PubMed PubMed Central Google Scholar
Joyce J. Bayes’ Theorem. In: Zalta EN, editor. The Stanford Encyclopedia of Philosophy [Internet]. Fall 2021. Metaphysics Research Lab, Stanford University; 2021. Available from: https://plato.stanford.edu/archives/fall2021/entries/bayes-theorem/
Casella G. An Introduction to Empirical Bayes Data Analysis. Am Stat. 1985;39(2):83–7.
Article Google Scholar
Casella G. Illustrating empirical Bayes methods. Chemometrics Intellig Lab Syst. 1992;16(2):107–25.
Article CAS Google Scholar
Ayyub BM, Klir GJ. Uncertainty Modeling and Analysis in Engineering and the Sciences. Chapman and Hall/CRC; 2006.
Willink R, White R. Disentangling Classical and Bayesian Approaches to Uncertainty Analysis. New Zeland: Measurement Standards Laboratory; 2012.
Oosterhuis WP, Theodorsson E. Total error vs. measurement uncertainty: revolution or evolution? Clin Chem Lab Med. 2016;54(2):235–9.
Article CAS PubMed Google Scholar
M H Ramsey S L R Ellison P Rostron. Measurement uncertainty arising from sampling - A guide to methods and approaches. 2nd ed. EURACHEM/CITAC; 2019.
Ellison SLR, Williams A. Quantifying Uncertainty in Analytical Measurement. 3rd ed. EURACHEM/CITAC; 2012. Report No.: CG 4.
Chatzimichail T, Hatjimihail AT. A Software Tool for Calculating the Uncertainty of Diagnostic Accuracy Measures. Diagnostics (Basel). 2021;11(3) . Available from: https://doi.org/10.3390/diagnostics11030406
Chatzimichail T, Hatjimihail AT. A Software Tool for Estimating Uncertainty of Bayesian Posterior Probability for Disease. Diagnostics (Basel). 2024;14(4). Available from: https://doi.org/10.3390/diagnostics14040402.
Kallner A, Boyd JC, Duewer DL, Giroud C, Hatjimihail AT, Klee GG, et al. Expression of Measurement Uncertainty in Laboratory Medicine; Approved Guideline. Clinical and Laboratory Standards Institute; 2012.
Joint Committee for Guides in Metrology. Evaluation of measurement data – Supplement 2 to the “Guide to the expression of uncertainty in measurement” – Extension to any number of output quantities. Pavillon de Breteuil, F-92312 Sèvres, Cedex, France: BIPM; 2011 Oct. Report No.: JCGM 102:2011. Available from: https://www.bipm.org/documents/20126/2071204/JCGM_102_2011_E.pdf/6a3281aa-1397-d703-d7a1-a8d58c9bf2a5
White GH. Basics of estimating measurement uncertainty. Clin Biochem Rev. 2008;29(Suppl 1):S53–60.
PubMed PubMed Central Google Scholar
Agresti A, Franklin C, Klingenberg B. Statistics: The art and science of learning from data, global edition. 4th ed. London, England: Pearson Education; 2023.
Miller J, Miller JC. Statistics and Chemometrics for Analytical Chemistry. 7th ed. London, England: Pearson Education; 2018. 312 p.
J. Aitchison JACB. The Lognormal Distribution with special reference to its uses in econometrics. Cambridge: Cambridge University Press; 1957.
Agresti A, Coull BA. Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions. Am Stat. 1998;52(2):119–26.
Google Scholar
Wilson BM, Smith BL. Taylor-series and Monte-Carlo-method uncertainty estimation of the width of a probability distribution based on varying bias and random error. Meas Sci Technol. 2013;24(3):035301.
Article CAS Google Scholar
Satterthwaite FE. An approximate distribution of estimates of variance components. Biometrics. 1946;2(6):110–4.
Article CAS PubMed Google Scholar
Welch BL. The Generalization of `Student’s’ Problem when Several Different Population Variances are Involved. Biometrika. 1947;34(1/2):28–35.
Article CAS PubMed Google Scholar
American Diabetes Association. 2. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes-2021. Diabetes Care. 2021 Jan;44(Suppl 1):S15–33.
Sun H, Saeedi P, Karuranga S, Pinkepank M, Ogurtsova K, Duncan BB, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022;183:109119.
Article PubMed Google Scholar
National Center for Health Statistics. National Health and Nutrition Examination Survey Data [Internet]. Centers for Disease Control and Prevention. 2005-20016 [cited 2023 Sep 4]. Available from: https://wwwn.cdc.gov/nchs/nhanes/default.aspx
National Center for Health Statistics. National Health and Nutrition Examination Survey Questionnaire [Internet]. Centers for Disease Control and Prevention. 2005-20016 [cited 2023 Sep 4]. Available from: https://wwwn.cdc.gov/nchs/nhanes/Search/variablelist.aspx?Component=Questionnaire
Petrone S, Rousseau J, Scricciolo C. Bayes and empirical Bayes: do they merge? Biometrika. 2014;101(2):285–302.
Article Google Scholar
Myung IJ. Tutorial on maximum likelihood estimation. J Math Psychol. 2003;47(1):90–100.
Article Google Scholar
Johnson ML. Nonlinear least-squares fitting methods. In: Methods Cell Biol. Academic Press; 2008. p. 781–805.
Bates DM, Watts DG. Nonlinear Regression Analysis and Its Applications. Hoboken, New Jersey: John Wiley & Sons, Inc.; 1988.
Book Google Scholar
Darling DA. The Kolmogorov-Smirnov, Cramer-von Mises Tests. Ann Math. Stat. 1957;28(4):823–38.
Google Scholar
ElSayed NA, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, et al. 2. Classification and Diagnosis of Diabetes: Standards of Care in Diabetes-2023. Diabetes Care. 2023 Jan 1;46(Suppl 1):S19–40.
Lippi G, Simundic A-M, Plebani M. Potential preanalytical and analytical vulnerabilities in the laboratory diagnosis of coronavirus disease 2019 (COVID-19). Clin Chem Lab Med. 2020. Available from: https://doi.org/10.1515/cclm-2020-0285
Martin H. Kroll, MD Bipasa Biswas Jeffrey R. Budd, PhD Paul Durham, MA Robert T. Gorman, PhD Thomas E. Gwise, PhD Abdel-Baset Halim, PharmD, PhD, DABCC Aristides T. Hatjimihail, MD, PhD Jørgen Hilden, MD Kyunghee Song. Assessment of the Diagnostic Accuracy of Laboratory Tests Using Receiver Operating Characteristic Curves; Approved Guideline—Second Edition. Clinical and Laboratory Standards Institute; 2011.
Tang Y-W, Schmitz JE, Persing DH, Stratton CW. The Laboratory Diagnosis of COVID-19 Infection: Current Issues and Challenges. J Clin Microbiol. 2020. Available from: https://doi.org/10.1128/JCM.00512-20
Deeks JJ, Dinnes J, Takwoingi Y, Davenport C, Leeflang MMG, Spijker R, et al. Diagnosis of SARS-CoV-2 infection and COVID-19: accuracy of signs and symptoms; molecular, antigen, and antibody tests; and routine laboratory markers. Cochrane Infectious Diseases Group, editor. Cochrane Database Syst Rev. 2020;26:1896.
Google Scholar
Infantino M, Grossi V, Lari B, Bambi R, Perri A, Manneschi M, et al. Diagnostic accuracy of an automated chemiluminescent immunoassay for anti-SARS-CoV-2 IgM and IgG antibodies: an Italian experience. J Med Virol. 2020. Available from: https://doi.org/10.1002/jmv.25932
Mahase E. Covid-19: “Unacceptable” that antibody test claims cannot be scrutinised, say experts. BMJ. 2020;18(369):m2000.
Article Google Scholar
Choi Y-K, Johnson WO, Thurmond MC. Diagnosis using predictive probabilities without cut-offs. Stat Med. 2006;25(4):699–717.
Article PubMed Google Scholar
Smith AFM, Gelfand AE. Bayesian Statistics without Tears: A Sampling-Resampling Perspective. Am Stat. 1992;46(2):84–8.
Google Scholar
Forbes C, Evans M, Hastings N, Peacock B. Statistical Distributions. John Wiley & Sons; 2011. 230 p.
Srinivasan P, Westover MB, Bianchi MT. Propagation of uncertainty in Bayesian diagnostic test interpretation. South Med J. 2012;105(9):452–9.
Article PubMed PubMed Central Google Scholar
Wereski R, Kimenai DM, Taggart C, Doudesis D, Lee KK, Lowry MTH, et al. Cardiac Troponin Thresholds and Kinetics to Differentiate Myocardial Injury and Myocardial Infarction. Circulation. 2021;144(7):528–38.
Article CAS PubMed PubMed Central Google Scholar
Roberts E, Ludman AJ, Dworzynski K, Al-Mohammad A, Cowie MR, McMurray JJV, et al. The diagnostic accuracy of the natriuretic peptides in heart failure: systematic review and diagnostic meta-analysis in the acute care setting. BMJ. 2015;4(350):h910.
Article Google Scholar
Freund Y, Chauvin A, Jimenez S, Philippon A-L, Curac S, Fémy F, et al. Effect of a Diagnostic Strategy Using an Elevated and Age-Adjusted D-Dimer Threshold on Thromboembolic Events in Emergency Department Patients With Suspected Pulmonary Embolism: A Randomized Clinical Trial. JAMA. 2021;326(21):2141–9.
Article CAS PubMed PubMed Central Google Scholar
Rani PR, Begum J. Screening and Diagnosis of Gestational Diabetes Mellitus, Where Do We Stand. J Clin Diagn Res. 2016;10(4):QE01–4.
CAS PubMed PubMed Central Google Scholar
Reyes Domingo F, Avey MT, Doull M. Screening for thyroid dysfunction and treatment of screen-detected thyroid dysfunction in asymptomatic, community-dwelling adults: a systematic review. Syst Rev. 2019;8(1):260.
Article PubMed PubMed Central Google Scholar
Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31(4):337–50.
Article PubMed PubMed Central Google Scholar
Horvath AR, Bell KJL, Ceriotti F, Jones GRD, Loh TP, Lord S, et al. Outcome-based analytical performance specifications: current status and future challenges. Clin Chem Lab Med. 2024. Available from: https://doi.org/10.1515/cclm-2024-0125
Baron JA. Uncertainty in Bayes. Med Decis Making. 1994;14(1):46–51.
Article CAS PubMed Google Scholar
Ashby D, Smith AF. Evidence-based medicine as Bayesian decision-making. Stat Med. 2000;19(23):3291–305.
Article CAS PubMed Google Scholar
Tucker LA. Limited Agreement between Classifications of Diabetes and Prediabetes Resulting from the OGTT, Hemoglobin A1c, and Fasting Glucose Tests in 7412 U.S. Adults. J Clin Med Res. 2020;9(7). Available from: https://doi.org/10.3390/jcm9072207
Sacks DB, Arnold M, Bakris GL, Bruns DE, Horvath AR, Lernmark Å, et al. Guidelines and Recommendations for Laboratory Analysis in the Diagnosis and Management of Diabetes Mellitus. Clin Chem. 2023;69(8):808–68.
Article PubMed Google Scholar
Congdon PD. Bayesian hierarchical models: With applications using R, second edition [Internet]. 2nd ed. Philadelphia, PA: Chapman & Hall/CRC; 2021 [cited 2024 Aug 4]. 592 p. Available from: https://books.google.com/books/about/Bayesian_Hierarchical_Models.html?hl=el&id=hlivDwAAQBAJ
Kass RE, Raftery AE. Bayes Factors. J Am Stat Assoc. 1995;90(430):773–95.
Article Google Scholar
Bozza S, Taroni F, Biedermann A. Bayes factors for forensic decision analyses with R. 1st ed. Cham, Switzerland: Springer International Publishing; 2022. 187 p. (Springer texts in statistics).
Willink R. Measurement Uncertainty and Probability. Cambridge University Press; 2013.
Knottnerus JA, Buntinx F, editors. The evidence base of clinical diagnosis. 2nd ed. BMJ Books; 2011. 320 p. (Evidence-Based Medicine).
Chatzimichail T, Hatjimihail AT. A Software Tool for Exploring the Relation between Diagnostic Accuracy and Measurement Uncertainty. Diagnostics (Basel). 2020;10(9):610. Available from: https://doi.org/10.3390/diagnostics10090610.
Knottnerus JA, Dinant GJ. Medicine based evidence, a prerequisite for evidence based medicine. BMJ. 1997;315(7116):1109–10.
Article CAS PubMed PubMed Central Google Scholar
Pfeiffer RM, Castle PE. With or without a gold standard. Epidemiology. 2005;16(5):595–7.
Article PubMed Google Scholar
Nair R, Aggarwal R, Khanna D. Methods of formal consensus in classification/diagnostic criteria and guideline development. Semin Arthritis Rheum. 2011;41(2):95–105.
Article PubMed PubMed Central Google Scholar
van Smeden M, Naaktgeboren CA, Reitsma JB, Moons KGM, de Groot JAH. Latent class models in diagnostic studies when there is no reference standard–a systematic review. Am J Epidemiol. 2014;179(4):423–31.
Article PubMed Google Scholar
Solberg HE. Approved recommendation (1987) on the theory of reference values. Part 5. Statistical treatment of collected reference values. Determination of reference limits. Clin Chim Acta. 1987;170(2):S13–32.
Article CAS Google Scholar
Pavlov IY, Wilson AR, Delgado JC. Reference interval computation: which method (not) to choose? Clin Chim Acta. 2012;413(13–14):1107–14.
Article CAS PubMed Google Scholar
Sikaris K. Application of the Stockholm hierarchy to defining the quality of reference intervals and clinical decision limits. Clin Biochem Rev. 2012;33(4):141–8.
PubMed PubMed Central Google Scholar
Daly CH, Liu X, Grey VL, Hamid JS. A systematic review of statistical methods used in constructing pediatric reference intervals. Clin Biochem. 2013;46(13–14):1220–7.
Article PubMed Google Scholar
Ozarda Y, Sikaris K, Streichert T, Macri J, IFCC Committee on Reference intervals and Decision Limits (C-RIDL). Distinguishing reference intervals and clinical decision limits - A review by the IFCC Committee on Reference Intervals and Decision Limits. Crit Rev Clin Lab Sci. 2018;55(6):420–31.
Wilson JMG, Jungner G. Principles and practice of screening for disease. Geneva: World Health Organization; 1968. 163 p. (Public health papers; vol. 34).
Petersen PH, Horder M. 2.3 Clinical test evaluation. Unimodal and bimodal approaches. Scand J Clin Lab Invest. 1992;52(208):51–7.
Google Scholar
Pires AM, Amado C. Interval Estimators for a Binomial Proportion: Comparison of Twenty Methods. Revstat Stat J. 2008;6(2):165–97.
Google Scholar
Schmoyeri RL, Beauchamp JJ, Brandt CC, Hoffman FO. Difficulties with the lognormal model in mean estimation and testing. Environ Ecol Stat. 1996;3(1):81–97.
Article Google Scholar
Bhaumik DK, Kapur K, Gibbons RD. Testing Parameters of a Gamma Distribution for Small Samples. Technometrics. 2009;51(3):326–34.
Article Google Scholar
Joint Committee for Guides in Metrology. Evaluation of measurement data — Supplement 1 to the “Guide to the expression of uncertainty in measurement”— Propagation of distributions using a Monte Carlo method Joint Committee for Guides in Metrology. Pavillon de Breteuil, F-92312 Sèvres, Cedex, France: BIPM; 2008. Report No.: JCGM 101:2008. Available from: https://www.bipm.org/documents/20126/2071204/JCGM_101_2008_E.pdf/325dcaad-c15a-407c-1105-8b7f322d651c
Williams A. Calculation of the expanded uncertainty for large uncertainties using the lognormal distribution. Accredit Qual Assur. 2020;25(5):335–8.
Article CAS Google Scholar
Stephens M. The Bayesian lens and Bayesian blinkers. Philos Trans A Math Phys Eng Sci. 2023;381(2247):20220144.
PubMed PubMed Central Google Scholar
Joint Committee for Guides in Metrology. Guide to the expression of uncertainty in measurement — Part 6: Developing and using measurement models [Internet]. Pavillon de Breteuil, F-92312 Sèvres, Cedex, France: BIPM; 2020. Report No.: JCGM GUM-6:2020. Available from: https://www.bipm.org/documents/20126/2071204/JCGM_GUM_6_2020.pdf/d4e77d99-3870-0908-ff37-c1b6a230a337
Oosterhuis WP. Analytical performance specifications in clinical chemistry: the holy grail? J Lab Precis Med. 2017;2:78–78.
Article Google Scholar
Box GEP. Robustness in the strategy of scientific model building. In: Robustness in Statistics. Elsevier; 1979. p. 201–36.
Rao SS, Disraeli P, McGregor T. Impaired glucose tolerance and impaired fasting glucose. Am Fam Physician. 2004;69(8):1961–8.
PubMed Google Scholar
Meneilly GS, Elliott T. Metabolic alterations in middle-aged and elderly obese patients with type 2 diabetes. Diabetes Care. 1999;22(1):112–8.
Article CAS PubMed Google Scholar
Geer EB, Shen W. Gender differences in insulin resistance, body composition, and energy balance. Gend Med. 2009;6 Suppl 1(Suppl 1):60–75.
Article PubMed Google Scholar
Van Cauter E, Polonsky KS, Scheen AJ. Roles of circadian rhythmicity and sleep in human glucose regulation. Endocr Rev. 1997;18(5):716–38.
PubMed Google Scholar
Colberg SR, Sigal RJ, Fernhall B, Regensteiner JG, Blissmer BJ, Rubin RR, et al. Exercise and type 2 diabetes: the American College of Sports Medicine and the American Diabetes Association: joint position statement. Diabetes Care. 2010;33(12):e147–67.
Article PubMed PubMed Central Google Scholar
Salmerón J, Manson JE, Stampfer MJ, Colditz GA, Wing AL, Willett WC. Dietary fiber, glycemic load, and risk of non-insulin-dependent diabetes mellitus in women. JAMA. 1997;277(6):472–7.
Article PubMed Google Scholar
Surwit RS, van Tilburg MAL, Zucker N, McCaskill CC, Parekh P, Feinglos MN, et al. Stress management improves long-term glycemic control in type 2 diabetes. Diabetes Care. 2002;25(1):30–4.
Article PubMed Google Scholar
Pandit MK, Burke J, Gustafson AB, Minocha A, Peiris AN. Drug-induced disorders of glucose tolerance. Ann Intern Med. 1993;118(7):529–39.
Article CAS PubMed Google Scholar
Dupuis J, Langenberg C, Prokopenko I, Saxena R, Soranzo N, Jackson AU, et al. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet. 2010;42(2):105–16.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Institutional review board statement

Data collection was carried out following the rules of the Declaration of Helsinki. The National Center for Health Statistics Ethics Review Board approved data collection and posting of the data online for public use. The National Center for Health Statistics NHANES—NCHS Research Ethics Review Board Approval (Protocols #2005-06 and #2011-17) is available online at: https://www.cdc.gov/nchs/nhanes/irba98.htm (accessed on May 18, 2024).

Informed consent statement

Written consent was obtained from each subject participating in the survey.

Funding

This research received no external funding.

Author information

Authors and Affiliations

Hellenic Complex Systems Laboratory, Kostis Palamas 21, 66131, Drama, Greece
Theodora Chatzimichail & Aristides T. Hatjimihail

Authors

Theodora Chatzimichail
View author publications
Search author on:PubMed Google Scholar
Aristides T. Hatjimihail
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: T.C.; methodology: T.C. and A.T.H.; software: T.C. and A.T.H.; validation: T.C.; formal analysis: T.C. and A.T.H.; investigation: T.C.; resources: A.T.H.; data curation: T.C.; writing—original draft preparation: T.C.; writing—review and editing A.T.H.; visualization: T.C.; supervision: A.T.H.; project administration: T.C. All authors reviewed the manuscript.

Corresponding author

Correspondence to Aristides T. Hatjimihail.

Ethics declarations

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article has been revised: an enumeration error concerning Subsection 5.4 page 3 of the Supplementary Material titled BayesianDiagnosticInsightsInterface.pdf has been corrected.

Supplementary Information

Supplementary Material 1.

Appendix A

A.1. Notation

A.1.1. Acronyms

CDF: cumulative distribution function

PDF: probability density function

FPG: fasting plasma glucose

OGTT: oral glucose tolerance test

QC: quality control

NHANES: National Health and Nutrition Examination Survey

A.1.2. Abbreviations

$D$: disease

$\overline{D }$: absence of disease

$T$: diagnostic test result

A.1.3. Parameters

$t$: diagnostic threshold

${\mu }_{D}$: mean of the measurements of the diseased population

${\sigma }_{D}$: standard deviation of the measurements of the diseased population

${d}_{D}$: distribution of the measurements of the diseased population

${\mu }_{\overline{D} }$ : mean of the measurements of the nondiseased population

${\sigma }_{\overline{D} }$ : standard deviation of the measurements of the nondiseased population

${d}_{\overline{D} }$: distribution of the measurements of the nondiseased population

${n}_{D}$ : size of the diseased population sample

${m}_{D}$: mean of the measurements of the diseased population sample

${s}_{D}$: standard deviation of the measurements of the diseased population sample

${n}_{\overline{D} }$: size of the nondiseased population sample

${m}_{\overline{D} }$ : mean of the measurements of the nondiseased population sample

${s}_{\overline{D} }$ : standard deviation of the measurements of the nondiseased population sample

$v$ : prior probability for disease or prevalence rate

${n}_{U}$ : number of QC measurements

${b}_{0}$ : constant contribution to measurement uncertainty

${b}_{1}$: measurement uncertainty proportionality constant

$p$: confidence level

${\varvec{\theta}}$: Parameter vector

A.1.4. Bayesian Diagnostic Measures

$P\left(D|T\ge t\right)$: positive predictive value

$P\left(\overline{D }|T<t\right)$: negative predictive value

$P\left(D|T=t\right)$: posterior probability for disease

$P\left(\overline{D }|T=t\right)$: posterior probability for the absence of disease

A.1.5. Functions

$f\left(x\right)$: probability density function

$F\left(x\right)$: cumulative distribution function

${u}_{m}\left(x\right)$: standard measurement uncertainty

${u}_{s}\left(x\right)$: standard sampling uncertainty

${_{l}^{ }u}_{c}\left(x\right)$: standard combined uncertainty

${_{l}^{ }\nu }_{eff}(x)$: effective degrees of freedom

$inf(f)$: lower bound of $f$

$sup(f)$: upper bound of $f$

A.2. Input

A.2.1. Range of input parameters

$\begin{aligned} &t:maximum(0, minimum\left({m}_{\overline{D} }-5{s}_{\overline{D} },{m}_{D}-5{s}_{\overline{D} }\right))\\ &\quad -maximum({m}_{\overline{D} }+5{s}_{\overline{D} },{m}_{D}+5{s}_{\overline{D} }) \end{aligned}$

${n}_{D}$ : 2 – 10,000

${m}_{D}$: 0.1 – 10,000

${s}_{D}$: 0.01 – 1,000

${n}_{\overline{D} }$: 2 – 10,000

${m}_{\overline{D} }$ : 0.1 – 10,000

${s}_{\overline{D} }$ : 0.01 – 1,000

$v$ : 0.001 – 0.999

${n}_{U}$ : 20 – 10,000

${b}_{0}$ : 0 – ${\sigma }_{\overline{D} }$

${b}_{1}$: 0 – 0.1000

$p$: 0.900 – 0.999

$t, {m}_{D}, {s}_{D}, {m}_{\overline{D} },$ and ${s}_{\overline{D} }$ are defined in arbitrary units.

A.2.2. Additional input options

A.2.2.1. Plots

Users can select between an extended and limited plot range.

A.2.2.2.2. Tables

Users can define the number of decimal digits for results, ranging from 1 to 10.

A.3. Software availability and requirements

Program name: Bayesian Diagnostic Insights

Version: 2.1.0

Project home page: https://www.hcsl.com/Tools/BayesianDiagnosticInsights/ (accessed on October 4, 2024)

Program source: BayesianDiagnosticInsights.nb. Available at: https://www.hcsl.com/Tools/BayesianDiagnosticInsights/BayesianDiagnosticInsights.nb (accessed on October 4, 2024).

Operating systems: Microsoft Windows 10+, Linux 3.15+, Apple macOS 11+

Programming language: Wolfram Language

Other software requirements: To run the program and read the BayesianDiagnosticInsightsCalculations.nb file Wolfram Player® ver. 14.0+ is required, freely available at https://www.wolfram.com/player/ (accessed on September 23, 2024) or Wolfram Mathematica® ver. 14.0+.

System requirements: Intel® i9™ or equivalent CPU and 32 GB of RAM

License: Attribution—Noncommercial—ShareAlike 4.0 International Creative Commons License

A.4. A note about the program controls

The program features an intuitive tabbed user interface to streamline user interaction and facilitate effortless navigation across multiple modules and submodules.

Users may define the numerical settings with menus or sliders. Sliders are finely manipulated by pressing the alt or opt key while dragging the mouse. Pressing the shift or ctrl keys can even more finely manipulate them.

Dragging with the mouse while pressing the ctrl, alt, or opt keys zooms plots in or out. When the mouse cursor is positioned over a point on a curve in a plot, the coordinates of that point are displayed, and vertical drop lines are drawn to the respective axes.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Chatzimichail, T., Hatjimihail, A.T. A software tool for applying Bayes' theorem in medical diagnostics. BMC Med Inform Decis Mak 24, 399 (2024). https://doi.org/10.1186/s12911-024-02721-x

Download citation

Received: 19 May 2024
Accepted: 14 October 2024
Published: 21 December 2024
DOI: https://doi.org/10.1186/s12911-024-02721-x

A software tool for applying Bayes' theorem in medical diagnostics

Abstract

Background

Objective

Methods

Results

Conclusion

Introduction

Medical diagnosis

Threshold based diagnosis

Diagnostic accuracy measures

Bayesian diagnosis

Bayesian inference

a) Prior distribution

b) Likelihood function

c) Posterior distribution

d) Workflow

Empirical Bayesian methods

Uncertainty

Measurement uncertainty

Sampling uncertainty

Uncertainty of diagnostic accuracy measures and Bayesian posterior probabilities

Bayesian diagnostic measures

Methods

Calculations

Calculation of Bayesian diagnostic measures

Positive and negative predictive value

Posterior probability for disease and the absence of disease

Uncertainty quantification

Measurement uncertainty

Sampling uncertainty of means and standard deviations

Sampling uncertainty of prevalence or prior probability for disease

Measures combined uncertainty

Measures expanded uncertainty

The software

Program overview

Input parameters

Parametric distributions

Bayesian diagnostic measures

Definition of populations and samples parameters and statistics

Measurement uncertainty

Output

Illustrative case study

Results

Measures

Uncertainty

Discussion

Limitations of the program

Limitations of the case study

Conclusion

Data availability

Change history

28 January 2025

References

Acknowledgements

Institutional review board statement

Informed consent statement

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Supplementary Material 1.

Appendix A

Appendix A

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Medical Informatics and Decision Making

Contact us