Search code examples
pythonmachine-learningnlpibm-watsontopic-modeling

Topic modeling example with Watson SDK API


I have a list of documents that are going to be uploaded online at different points in time. I don't have any prior information about the contents, neither do i have any information on the probable labels that can be assigned to the documents, also i don't have any historical data (hence I can't train a classifier with Watson natural language classier service). What I want is some real time categorization / topic assignment to these documents. For example, some API like the following is something I am searching for :

service.getTopics('some text')

returning something like the following in real-time

"categories": [
          {
            "score": 0.949576,
            "label": "/technology and computing/networking"
          },
          {
            "score": 0.911692,
            "label": "/technology and computing/networking/network monitoring and management"
          },
          {
            "score": 0.879639,
            "label": "/business and industrial/business operations/management"
          }
]

Is it possible with Watson discovery or NLU service? I am using python SDK APIs, an example / any relevant link will be very helpful. Thanks


Solution

  • I think the categories or concepts features of the Watson Natural Language Understanding service is the best fit. You can't send a document directly using the API, so you would need to extract the text, but if you are able to do that then:

    Example cribbed from the API Docs page

    
    from ibm_watson import NaturalLanguageUnderstandingV1
    from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
    from ibm_watson.natural_language_understanding_v1 
        import Features, ConceptsOptions, CategoriesOptions
    
    authenticator = IAMAuthenticator('{apikey}')
    natural_language_understanding = NaturalLanguageUnderstandingV1(
        version='2019-07-12',
        authenticator=authenticator)
    
    natural_language_understanding.set_service_url('{url}')
    
    response = natural_language_understanding.analyze(
        text='IBM is an American multinational technology company '
        'headquartered in Armonk, New York, United States, '
        'with operations in over 170 countries.',
        features=Features(
            categories=CategoriesOptions(limit=5),
            concepts=ConceptsOptions(limit=5))).get_result()
    
    
    
    

    More information is in the API documentation - https://cloud.ibm.com/apidocs/natural-language-understanding/natural-language-understanding?code=python#categories