Search code examples
google-cloud-platformnlpgoogle-natural-language

Unsure how to resolve language error message from Google's natural language api: "The language sq is not supported for document_sentiment analysis."


I have an app that's been working for months and is now giving me an error.

The app takes tweets from the Twitter API and runs them through Google's Sentiment Analysis API, returning sentiment analysis on each of the tweets.

Without changing the code, I'm suddenly getting a error that hasn't happened before.

Error message

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT
    details = "The language sq is not supported for document_sentiment analysis."
    debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.69.95:443 {grpc_message:"The language sq is not supported for document_sentiment analysis.", grpc_status:3, created_time:"2022-12-24T04:26:31.031735656+00:00"}"
>

Interpretation

Even though I'm stating only 'english' language tweets in my Twitter API query (-is:retweet lang:en), my understanding of the error messsage is that the NL API is thinking this is some language referred to as sq. My research says that's 'Albanian'.

So my assumption is that the NL API is interpreting some block(s) of text in the tweets as being in Albanian, or maybe it's just a portion of an otherwise english tweet that has some Albanian language in it.

Solution

Is there a way to ignore or skip a text if the API can't process the language the text is in?

This is the language_v1 call:

def get_single_sentiment(text):
    '''gets non-entity sentiment of text using GCP's api'''
    
    # Instantiates a client
    client = language_v1.LanguageServiceClient()
    
    # The text to analyze 
    document = language_v1.Document(content = text , type_=language_v1.types.Document.Type.PLAIN_TEXT)

    # Detects the sentiment of the text
    sentiment = client.analyze_sentiment(request={"document": document}).document_sentiment

    return sentiment

Below is the full error message being returned when trying to run the sentiment analysis:

---------------------------------------------------------------------------
_InactiveRpcError                         Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     56         try:
---> 57             return callable_(*args, **kwargs)
     58         except grpc.RpcError as exc:

/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
    945                                       wait_for_ready, compression)
--> 946         return _end_unary_response_blocking(state, call, False, None)
    947 

/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
    848     else:
--> 849         raise _InactiveRpcError(state)
    850 

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT
    details = "The language sq is not supported for document_sentiment analysis."
    debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.69.95:443 {grpc_message:"The language sq is not supported for document_sentiment analysis.", grpc_status:3, created_time:"2022-12-24T04:26:31.031735656+00:00"}"
>

The above exception was the direct cause of the following exception:

InvalidArgument                           Traceback (most recent call last)
/tmp/ipykernel_1/1103340548.py in <module>
      1 twitter_stage(QUERY_TW, N_HOURS_AGO
----> 2               , TWITTER_BQ_TABLE, ENTITY)

/tmp/ipykernel_1/2800777156.py in twitter_stage(QUERY, N_HOURS_AGO, TWITTER_BQ_TABLE, ENTITY)
     39 
     40         # get sentiment analysis
---> 41         twitapi_df = get_column_sentiment(twitapi_df, text_col='text', entity=ENTITY, query=QUERY)
     42 
     43         # Dropping columns that can't be saved to big query because they are not compatible

/tmp/ipykernel_1/2183820933.py in get_column_sentiment(df, text_col, entity, query)
    110 
    111     # for each entry in text_col, get a single sentiment result
--> 112     sentiment_column = df[text_col].apply(f)
    113 
    114     # for each entry in sentiment_column, fix null values (replace nulls will two values)

/opt/conda/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwargs)
   4355         dtype: float64
   4356         """
-> 4357         return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
   4358 
   4359     def _reduce(

/opt/conda/lib/python3.7/site-packages/pandas/core/apply.py in apply(self)
   1041             return self.apply_str()
   1042 
-> 1043         return self.apply_standard()
   1044 
   1045     def agg(self):

/opt/conda/lib/python3.7/site-packages/pandas/core/apply.py in apply_standard(self)
   1099                     values,
   1100                     f,  # type: ignore[arg-type]
-> 1101                     convert=self.convert_dtype,
   1102                 )
   1103 

/opt/conda/lib/python3.7/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

/tmp/ipykernel_1/2183820933.py in get_single_sentiment(text)
     16 
     17     # Detects the sentiment of the text
---> 18     sentiment = client.analyze_sentiment(request={"document": document}).document_sentiment
     19 
     20     return sentiment

/opt/conda/lib/python3.7/site-packages/google/cloud/language_v1/services/language_service/client.py in analyze_sentiment(self, request, document, encoding_type, retry, timeout, metadata)
    509             retry=retry,
    510             timeout=timeout,
--> 511             metadata=metadata,
    512         )
    513 

/opt/conda/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py in __call__(self, timeout, retry, *args, **kwargs)
    152             kwargs["metadata"] = metadata
    153 
--> 154         return wrapped_func(*args, **kwargs)
    155 
    156 

/opt/conda/lib/python3.7/site-packages/google/api_core/retry.py in retry_wrapped_func(*args, **kwargs)
    286                 sleep_generator,
    287                 self._deadline,
--> 288                 on_error=on_error,
    289             )
    290 

/opt/conda/lib/python3.7/site-packages/google/api_core/retry.py in retry_target(target, predicate, sleep_generator, deadline, on_error)
    188     for sleep in sleep_generator:
    189         try:
--> 190             return target()
    191 
    192         # pylint: disable=broad-except

/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     57             return callable_(*args, **kwargs)
     58         except grpc.RpcError as exc:
---> 59             raise exceptions.from_grpc_error(exc) from exc
     60 
     61     return error_remapped_callable

InvalidArgument: 400 The language sq is not supported for document_sentiment analysis.

Proposed Solution

I'm thinking the best possible solution must be to ignore any non-english language, and I'm wondering if that's a reasonable approach, and if someone has input on how to approach that.

Greatly appreciate any input on resolving this. thx

Tweet Content Causing Problem

Update| #shqip #shqiperi #kosova #albania #kosovo #shqiptar #shqiptare #lajme #shqiperia #tirana #prishtina #visitalbania #albanian #tirane #albaniangirl #shqipe…

Solution

  • The issue can be resolved by explicitly specifying document language in the code. ie. specify language en, define the “type_” then declare it on “document” .

    For example :

    type_ = language_v1.Document.Type.PLAIN_TEXT
    language = "en"
    document = {"type_": type_, "content": content, "language": language}
    

    Sample code:

    def sample_analyze_sentiment(content):
     
        client = language_v1.LanguageServiceClient()
     
        if isinstance(content, six.binary_type):
            content = content.decode("utf-8")
     
        type_ = language_v1.Document.Type.PLAIN_TEXT
        language = "en"
        document = {"type_": type_, "content": content, "language": language}
     
        response = client.analyze_sentiment(request={"document": document})
        sentiment = response.document_sentiment
        print("Score: {}".format(sentiment.score))
        print("Magnitude: {}".format(sentiment.magnitude))