Search code examples
google-cloud-platformnamed-entity-recognitiongoogle-cloud-automl

Nested Named Entity Recognition with Google Cloud NLP


We can perform Simple Named Entity Recognition by uploading pdf complete documents, tagging simple entities and training.

But, does Google Cloud AutoML platform support Nested Named Entity Recognition ?


Solution

  • Not by default. From what I can tell, there isn't necessarily a standardized method to implement Nested Named Entity Recognition, either, which could be part of a reason why it isn't supported. I imagine to do this within a single process, each annotation would be required to have multiple annotations within it, which isn't possible:

    Each annotation can cover up to ten tokens (words). They cannot overlap; the start_offset of an annotation cannot be between the start_offset and end_offset of an annotation in the same document. [docs]

    You could, however, probably implement this yourself based on your understanding of nested NER. Train a general model to extract primary entities (the larger containing entities). Then, train a secondary model to extract secondary entities (the entities inside the primary entity). Run the secondary model only on outputs of the primary model. Potentially you should also implement some conditions such as number of tokens as well.