Skip to main content

Table 3 Impact on performance of choice of training data on performance in Yellow Card test data

From: Automated redaction of names in adverse event reports using transformer-based neural networks

  

Performance in YC test data:

Tokens in YC test data:

 

Token length

Training data

Precision

Recall

F1

False positive rate

NAMES

NON-NAMES

All

i2b2

59%

66%

62%

0.03%

179

263,272

 

i2b2 + YC

55%

87%

67%

0.05%

179

263,272

 

Long (> 3)

i2b2

59%

95%

73%

0.04%

108

162,582

 

i2b2 + YC

58%

94%

72%

0.04%

108

162,582

 

Short (≤ 3)

i2b2

63%

21%

32%

0.01%

71

100,690

 

i2b2 + YC

50%

75%

60%

0.05%

71

100,690

 
  1. bold: best score, YC: Yellow Card