I've been using the GCP AUtoML Python library version 2.2.0 for text extraction, and usually it works perfect. However sometimes it gives me this error:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/wr/api/simple_text_nlp.py", line 159, in extract_entity_from_text
predict_data, predict_id, predict_error = extract_entity.predict(text, model_path)
File "/usr/local/lib/python3.6/site-packages/ProfessorPatPending/ocr/textOperations.py", line 68, in predict
response = self.__client.predict(name=model_path, payload=payload)
File "/usr/local/lib/python3.6/site-packages/google/cloud/automl_v1/services/prediction_service/client.py", line 498, in predict
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "/usr/local/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 75, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 List of found errors: 1.Field: payload.text_snippet.content; Message: The provided string field value is longer than 10000: 10796
The TextSnippet in question has more than 10.000 characters, however the documentation clearly states that it can be up until 250.000 characters. Can someone explain to me what's going on?
The code to create the text snippet is:
client = automl.PredictionServiceClient.from_service_account_file(sa_json_file)
text_snippet = automl.TextSnippet(content=text_data, mime_type="text/plain")
payload = automl.ExamplePayload(text_snippet=text_snippet)
response = client.predict(name=model_path, payload=payload)
For obvious reasons, I won't post the text_data
itself here.
Thank you.
The error you encountered is on client.predict()
since you are sending a TextSnippet greater than 10k characters. AutoML Entity Extraction is only limited to 10k characters per prediction request.
AutoML Natural Language Entity Extraction
- A TextSnippet up to 10,000 characters, UTF-8 NFC encoded or a document in .PDF, .TIF or .TIFF format with size upto 20MB.
I suggest that you split your TextSnippet and send multiple request or trim the TextSnippet to 10k to satisfy character limit.