From: A pseudonymized corpus of occupational health narratives for clinical entity recognition in Spanish
Metric | Total |
---|---|
Documents | 1,787 |
Tokens | 221,854 |
Vocabulary | 20,779 |
Lexical diversity | 9.4\(\%\) |
Tok. per doc. | 124± 93 |
Ent. per doc. | 8.6 ± 5.7 |
Annotated tokens | 27,036 |
PII Entities | 5,460 |
Medical Entities | 10,019 |