Search code examples
python-3.xgoogle-cloud-platformgoogle-cloud-automl

GCP AutoML TextSnippet longer than 10.000 characters


I've been using the GCP AUtoML Python library version 2.2.0 for text extraction, and usually it works perfect. However sometimes it gives me this error:

    Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/wr/api/simple_text_nlp.py", line 159, in extract_entity_from_text
    predict_data, predict_id, predict_error = extract_entity.predict(text, model_path)
  File "/usr/local/lib/python3.6/site-packages/ProfessorPatPending/ocr/textOperations.py", line 68, in predict
    response = self.__client.predict(name=model_path, payload=payload)
  File "/usr/local/lib/python3.6/site-packages/google/cloud/automl_v1/services/prediction_service/client.py", line 498, in predict
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
  File "/usr/local/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 75, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 List of found errors:   1.Field: payload.text_snippet.content; Message: The provided string field value is longer than 10000: 10796

The TextSnippet in question has more than 10.000 characters, however the documentation clearly states that it can be up until 250.000 characters. Can someone explain to me what's going on?

The code to create the text snippet is:

client = automl.PredictionServiceClient.from_service_account_file(sa_json_file)    
text_snippet = automl.TextSnippet(content=text_data, mime_type="text/plain")
payload = automl.ExamplePayload(text_snippet=text_snippet)
response = client.predict(name=model_path, payload=payload)

For obvious reasons, I won't post the text_data itself here.

Thank you.


Solution

  • The error you encountered is on client.predict() since you are sending a TextSnippet greater than 10k characters. AutoML Entity Extraction is only limited to 10k characters per prediction request.

    AutoML Natural Language Entity Extraction

    • A TextSnippet up to 10,000 characters, UTF-8 NFC encoded or a document in .PDF, .TIF or .TIFF format with size upto 20MB.

    I suggest that you split your TextSnippet and send multiple request or trim the TextSnippet to 10k to satisfy character limit.