I have a list of documents that are going to be uploaded online at different points in time. I don't have any prior information about the contents, neither do i have any information on the probable labels that can be assigned to the documents, also i don't have any historical data (hence I can't train a classifier with Watson natural language classier service). What I want is some real time categorization / topic assignment to these documents. For example, some API like the following is something I am searching for :
service.getTopics('some text')
returning something like the following in real-time
"categories": [
{
"score": 0.949576,
"label": "/technology and computing/networking"
},
{
"score": 0.911692,
"label": "/technology and computing/networking/network monitoring and management"
},
{
"score": 0.879639,
"label": "/business and industrial/business operations/management"
}
]
Is it possible with Watson discovery or NLU service? I am using python SDK APIs, an example / any relevant link will be very helpful. Thanks
I think the categories
or concepts
features of the Watson Natural Language Understanding service is the best fit. You can't send a document directly using the API, so you would need to extract the text, but if you are able to do that then:
Example cribbed from the API Docs page
from ibm_watson import NaturalLanguageUnderstandingV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson.natural_language_understanding_v1
import Features, ConceptsOptions, CategoriesOptions
authenticator = IAMAuthenticator('{apikey}')
natural_language_understanding = NaturalLanguageUnderstandingV1(
version='2019-07-12',
authenticator=authenticator)
natural_language_understanding.set_service_url('{url}')
response = natural_language_understanding.analyze(
text='IBM is an American multinational technology company '
'headquartered in Armonk, New York, United States, '
'with operations in over 170 countries.',
features=Features(
categories=CategoriesOptions(limit=5),
concepts=ConceptsOptions(limit=5))).get_result()
More information is in the API documentation - https://cloud.ibm.com/apidocs/natural-language-understanding/natural-language-understanding?code=python#categories