Skip to main content

Table 9 Validation on real narratives with fine tunned models using narratives generated with P1

From: The aluminum standard: using generative Artificial Intelligence tools to synthesize and annotate non-structured patient data

Models

Metrics

P

R

F1

BERT-Base German

Exact

0.203

0.115

0.147

Partial

0.293

0.166

0.212

Entity Type

0.382

0.217

0.276

SCAI-BIO/ BioGottBERT -base

Exact

0.31

0.161

0.212

Partial

0.434

0.226

0.297

Entity Type

0.558

0.29

0.382

allenai/scibert_scivocab_uncased (SciBERT)

Exact

0.185

0.129

0.152

Partial

0.275

0.191

0.266

Entity Type

0.364

0.253

0.299

  1. BioGottBERT obtains the highest scores for exact and partial metrics in terms of f1 and recall score. Further, it outperforms other models in terms of precision for partial metrics. In general, real narratives achieve significantly lower metric scores than synthetic narratives. Real narratives differ in style from synthetic narratives, i.e., the difference depends largely on the words used and the style of the prose (which is why synthesized texts generated with ChatGPT can appear subjectively naive). Because of this, we do not expect the models trained on synthetic data to perform better