Search code examples
concept-insights

Text Index Parameter in concept insights gives only one instance of concept occurance


I have multiple mentions of a concept (e.g. "Gold") in my document. However concept insights' conceptual search would return Gold as a concept and only one text index for that concept (usually the first occurrence, but not always). Is there a specific reason why concept insights' conceptual search ignores other mentions of "Gold" in the document? I am interested in pulling snippets of text around all the occurrences of a concept. It would have been great if I got the text index for all the mentions of that concept. Is there any way to get it, other than doing the string match at my end.

Thanks in advance for the help!


Solution

  • The conceptual search from Concept Insights does not ignore several mentions of the same (or related) concept within a document. In fact, the service uses this information to reinforce the system's understanding of the concept areas that are covered in each document.

    However, it is true that in the "explanation" of why a document is related to a document, the /conceptual_search endpoint returns a select set of concepts. Because the system is trying to show diversity of concepts that justify the connection between your query and a document, it can omit the repeated concepts as part of the "explanation" (you can think about this "explanation" akin to a snippet of text that a traditional search engine may suggest to the user why the document may be relevant; it is not the complete story of the associations found within the document).

    That being said, you can get all the concepts extracted within a document by using the /annotations endpoint: GET /v2/corpora/{account_id}/{corpus}/documents/{document}/annotations.

    (Documentation: https://watson-api-explorer.mybluemix.net/apis/concept-insights-v2#!/corpora/getDocumentAnnotations)

    For every annotation in the document, you get the concept id along with the positions in the text for the occurrence of the document. So, for your example above, you can:

    1) Call the /conceptual_search endpoint to retrieve documents relevant to your query, along with a number of explanation concepts (concepts that tie the document to your query); say you found that the concept is question is Gold.

    2) Call /{document}/annotations for the returned document, looking for additional occurrences of the "explanation concepts" (Gold) within the selected document. You should be able to build a list of Gold occurrences (along with lists of other explanation concepts), which cover the entire document.