I'm currently using Apache UIMA to retrieve a list of occurrences of phenotype terms. However, the documentation (Why do so many bioinformatics software APIs lack good documentation!) seems to only point towards the CAS debugger GUI rather than being able to return the annotation index.
https://i.sstatic.net/giNoj.png - Picture of the CAS GUI, I want it to return the annotation index in the bottom left
Like I said, the docs don't really answer this (https://uima.apache.org/documentation.html), but generally I want to be able to call the process() method in the Annotator class, and for it to return the annotation index once it has found any and all occurrences.
Sorry if it's a silly question with an obvious answer, I've spent three hours going through the docs so far and haven't come any closer to finding the answer, if anyone's tried integrating it into a project in a similar way and can point me in the right direction, it would be much appreciated!
The process methods change the state inside the CAS. After calling ae.process(cas) or ae.process(jcas), the annotations are stored in the CAS. Just get the annotation index from the (J)Cas.
Apache uimaFIT might also be convenient for you as it provides various "select" methods to access annotations in the (J)CAS, e.g.:
// CAS version
Type tokenType = CasUtil.getType(cas, "my.Token");
for (AnnotationFS token : CasUtil.select(cas, tokenType)) {
...
}
// JCas version
for (Token token : JCasUtil.select(jcas, Token.class)) {
...
}
More detailed information on this API can be found in the uimaFIT documentation, in particular in the sections on pipelines and on access methods.
Disclosure: I am working on Apache uimaFIT.