Search code examples
nlpstanford-nlp

Can't use OLLIE open information extraction method in Stanford Core NLP OpenIE


I am facing trying to extract triples using OLLIE with Stanford Core NLP's OpenIE tools.

I've installed both stanford-corenlp-3.9.1 as well stanford-corenlp-3.9.2 to try to extract triples from text.

For stanford-corenlp-3.9.1:

  • Can only extract information using default method, despite adding the flag "-format ollie" or "-openie.format ollie"
  • I've tested it with this sentence

    Some people say Barack Obama was not born in the United States.

    Which should yield this:

    (Barack Obama; was not born in; the United States)[attrib=Some people say]

    This is the example to test if the OpenIE methids is indeed ollie. But I get no triples instead. It does work for other sentences however, but the output is that of the default method.

For stanford-corenlp-3.9.2:

  • I was unable to extract any triples at all, but get this error instead.

    'java.lang.IllegalArgumentException: annotator "openie" requires annotation "CorefChainAnnotation". The usual requirements for this annotator are: tokenize,ssplit,pos,lemma,depparse,natlog'
    

EDITED:

  1. Turns out OLLIE wasn't supported in Stanford OpenIE, and the flags merely changes the output to in OLLIE's format instead.
  2. Able to run 3.9.2 version (see reply below).

Solution

  • So, Stanford OpenIE is not the same as Ollie; it just has an option to output in a format that is similar to (technically a subset of) the Ollie format.

    The Stanford OpenIE system is described in Angeli et al. "Leveraging Linguistic Structure For Open Domain Information Extraction". Ollie is described in Mausam et al. "Open Language Learning for Information Extraction".

    RE the missed extraction: Stanford's system models negation and false statements as a first order phenomenon, where it won't extract negated facts. This is to avoid cases where the downstream application has to disambiguate between a negated relation and a non-negated relation (e.g., what if a relation is in a double-negative context?). Therefore, both because of the "some people say" modifier and because of the negation, the system doesn't return anything.

    RE the exception: you're missing mention,coref as an annotator in your annotators list. Are you calling this from the command line, or from the annotation pipeline? If from the command line, can you include the command you used to run the program?