Search code examples
javauimawatson-explorer

How can I access document filename or URL in custom uima annotator using IBM Content Analytics?


I am writing a custom java annotator for our UIMA pipeline in Watson Explorer Content Analytics.

There are two places (I know of ) where I can try to get the URL or Filename of the document that is currently being processed.

Initialize

public class CustomAnnotator extends JCasAnnotator_ImplBase {

@Override
public void initialize(UimaContext aContext)
        throws ResourceInitializationException {
    super.initialize(aContext);
.... HERE MAYBE ? ....

Or

Process

@Override
public void process(JCas jcas) throws AnalysisEngineProcessException {
    try {
.... HERE ....

I have tried several options:

  • via context in initialize method(Running the pipeline on the server , I could get the PearID for example),
  • via the Sofa in the process method (e.g. jcas.getSofa().getSofaURI())

I also found SourceDocumentInformation , but this is an example and although the method getUri() seems promising, I depend on IBM to implement the setUri(String) method...

But so far I have not been successful, I hope I have overlooked something...


Solution

  • I asked the same question on IBM dwanwsers. In short, you can access multiple views when the pipeline runs in the Watson Explorer Content Analytics server. For metadata we need to inspect the _InitialView and not the rlw-view, which is the one that holds all annotations created by the custom pipeline you create in Content Analytics Studio More details can be found here, also look at the reponses ! https://www.ibm.com/developerworks/community/blogs/ibmandgoogle/entry/Exporting_annotations_from_Watson_Explorer_Content_Analytics?lang=en