Skip to main content

Table 4 Policy Learning: Rewards (\(\mu \pm \sigma\)) of policies learned using self-normalized inverse propensity scoring (SNIPS) formulation (10 simulations). Optdigits and Letter are two multiclass classification datasets from the UCI repository [30]. LR=Logistic Regression. NN=Neural Network

From: Clinical decision making under uncertainty: a bootstrapped counterfactual inference approach

Dataset

Expert Policy/ Logging Policy

\(\hat{h}_0\) - NN

Single NN

NN Ensemble

Adversarial

SNIPS

SNIPS\(_{inv}\)

SNIPS\(_{avg}\)

SNIPS

UCI

OPTDIGITS (10 actions)

0.785 ± 0.113

0.800 ± 0.143

0.805 ± 0.072

0.767 ± 0.093

LETTER (26 actions)

0.167 ± 0.055

0.139 ± 0.031

0.121 ± 0.047

0.157 ± 0.041

Warfarin

LR (3 actions)

0.602 ± 0.024

0.608 ± 0.024

0.600 ± 0.019

0.597 ± 0.024

LR (5 actions)

0.522 ± 0.022

0.525 ± 0.021

0.527 ± 0.024

0.532 ± 0.019

PHARMA (3 actions)

0.657 ± 0.015

0.646 ± 0.019

0.652 ± 0.020

0.672 ± 0.017

PHARMA (5 actions)

0.602 ± 0.016

0.588 ± 0.018

0.600 ± 0.018

0.628 ± 0.008

Heparin

Clinician (unknown)

0.321 ± 0.030

0.337 ± 0.046

0.336 ± 0.054

0.325 ± 0.039