Is there any available biomedical datasets that has been annotated? I am learning about how biomedical texts are annotated, in particular for disambiguation. But I am open to see annotations for other purposes.
Here are some corpora for you
| Entity | Corpus | Type | Size (sentences) |
|------------------|-----------------------------|------------|------------------|
| Gene and Protein | GENETAG [7] | Sentences | 20000 |
| | JNLPBA [6] (from GENIA [8]) | Abstracts | 22402 |
| | FSUPRGE [9] | Abstracts | ≈29447* |
| | PennBioIE [10] | Abstracts | ≈22877* |
| Species | OrganismTagger Corpus [11] | Full texts | 9863 |
| | Linnaeus Corpus [12] | Full texts | 19491 |
| Disorders | SCAI Disease [13] | Abstracts | ≈3640* |
| | EBI Disease [14] | Sentences | 600 |
| | Arizona Disease (AZDC) [15] | Sentences | 2500 |
| | BioText [16] | Abstracts | 3655 |
| Chemical | SCAI IUPAC [17] | Sentences | 20300 |
| | SCAI General [18] | Sentences | 914 |
| Anatomy | AnEM1 | Sentences | 4700 |
| Miscellaneous | CellFinder2 | Full texts | 2100 |