I'm new to ClearTK and UIMA. So far I couldn't find any examples on how to create a pipeline where no files are involved.
I'm trying to process a small text stored in a Java String variable using cleartk and UIMA, and get an XML String back (outcome of the ClearTK TimeML annotators).
I was able to provide a String as input (see code excerpt), but the code is far from elegant (needed to execute set and empty URI to the CAS.) Also, the output is being saved to a file, but I want to get a String back (it does not make sense to have the output saved to a file and then read the file back into memory).
import org.apache.uima.analysis_component.JCasAnnotator_ImplBase;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.fit.factory.AnalysisEngineFactory;
import org.apache.uima.fit.pipeline.SimplePipeline;
import org.apache.uima.jcas.JCas;
import org.cleartk.corpus.timeml.TempEval2007Writer;
import org.cleartk.opennlp.tools.PosTaggerAnnotator;
import org.cleartk.snowball.DefaultSnowballStemmer;
import org.cleartk.timeml.event.*;
import org.cleartk.timeml.time.TimeTypeAnnotator;
import org.cleartk.timeml.tlink.TemporalLinkEventToDocumentCreationTimeAnnotator;
import org.cleartk.timeml.tlink.TemporalLinkEventToSameSentenceTimeAnnotator;
import org.cleartk.timeml.tlink.TemporalLinkEventToSubordinatedEventAnnotator;
import org.cleartk.timeml.type.DocumentCreationTime;
import org.cleartk.token.tokenizer.TokenAnnotator;
import org.cleartk.util.cr.FilesCollectionReader;
...
String documentText = "First make sure that you are using eggs that are several days old...";
JCas sourceCas = createJCas();
sourceCas.setDocumentText(documentText);
ViewUriUtil.setURI(sourceCas, new URI(""));
SimplePipeline.runPipeline(
sourceCas,
org.cleartk.opennlp.tools.SentenceAnnotator.getDescription(),
TokenAnnotator.getDescription(),
PosTaggerAnnotator.getDescription(),
DefaultSnowballStemmer.getDescription("English"),
org.cleartk.opennlp.tools.ParserAnnotator.getDescription(),
org.cleartk.timeml.time.TimeAnnotator.FACTORY.getAnnotatorDescription(),
TimeTypeAnnotator.FACTORY.getAnnotatorDescription(),
EventAnnotator.FACTORY.getAnnotatorDescription(),
EventTenseAnnotator.FACTORY.getAnnotatorDescription(),
EventAspectAnnotator.FACTORY.getAnnotatorDescription(),
EventClassAnnotator.FACTORY.getAnnotatorDescription(),
EventPolarityAnnotator.FACTORY.getAnnotatorDescription(),
EventModalityAnnotator.FACTORY.getAnnotatorDescription(),
AnalysisEngineFactory.createEngineDescription(AddEmptyDCT.class),
TemporalLinkEventToDocumentCreationTimeAnnotator.FACTORY.getAnnotatorDescription(),
TemporalLinkEventToSameSentenceTimeAnnotator.FACTORY.getAnnotatorDescription(),
TemporalLinkEventToSubordinatedEventAnnotator.FACTORY.getAnnotatorDescription(),
TempEval2007Writer.getDescription("file:///tmp/out.tml"));
What would be the recommended way to have the pipeline take a String as input and produce another String as the execution result?
Run you engines with SimplePipeline
like you did, and then retrieve the annotations your are interested in from your sourceCas
like this:
Collection<MyAnnotation> myAnnotation = JCasUtil.select(sourceCas, MyAnnotation.class);
String myproperty = myAnnotation.getMyproperty();