Fig. 2

(a) Distribution of the input EHRs over the three sources; (b) distribution of EHRs in terms of: sentences per report, tokens per report and tokens per sentence; (c) distribution of the semantic categories over the topic-filtered sentences; (d) gold standard outcome distribution by data source