Search code examples
nlpannotations

Available biomedical annotated dataset


Is there any available biomedical datasets that has been annotated? I am learning about how biomedical texts are annotated, in particular for disambiguation. But I am open to see annotations for other purposes.


Solution

  • Here are some corpora for you

    | Entity           | Corpus                      | Type       | Size (sentences) |
    |------------------|-----------------------------|------------|------------------|
    | Gene and Protein | GENETAG [7]                 | Sentences  | 20000            |
    |                  | JNLPBA [6] (from GENIA [8]) | Abstracts  | 22402            |
    |                  | FSUPRGE [9]                 | Abstracts  | ≈29447*          |
    |                  | PennBioIE [10]              | Abstracts  | ≈22877*          |
    | Species          | OrganismTagger Corpus [11]  | Full texts | 9863             |
    |                  | Linnaeus Corpus [12]        | Full texts | 19491            |
    | Disorders        | SCAI Disease [13]           | Abstracts  | ≈3640*           |
    |                  | EBI Disease [14]            | Sentences  | 600              |
    |                  | Arizona Disease (AZDC) [15] | Sentences  | 2500             |
    |                  | BioText [16]                | Abstracts  | 3655             |
    | Chemical         | SCAI IUPAC [17]             | Sentences  | 20300            |
    |                  | SCAI General [18]           | Sentences  | 914              |
    | Anatomy          | AnEM1                       | Sentences  | 4700             |
    | Miscellaneous    | CellFinder2                 | Full texts | 2100             |
    

    source