I'm doing requests (online prediction) from Google App Engine to Google Cloud ML (I didn't create model) and from time to time I get exception "Deadline exceeded while waiting for HTTP response from URL" full trace here:
Deadline exceeded while waiting for HTTP response from URL: https://ml.googleapis.com/v1/projects/project-id/models/my-model/versions/v3:predict?alt=json (/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py:1552)
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
rv = self.handle_exception(request, response, e)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~project-id/1.402312581449917691/main.py", line 90, in post
response = predict(batch_obj=batch_data_obj)
File "/base/data/home/apps/s~project-id/1.402312581449917691/run_cloud_predict.py", line 88, in predict
response = request.execute()
File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/oauth2client/util.py", line 135, in positional_wrapper
return wrapped(*args, **kwargs)
File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/googleapiclient/http.py", line 835, in execute
method=str(self.method), body=self.body, headers=self.headers)
File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/googleapiclient/http.py", line 162, in _retry_request
resp, content = http.request(uri, method, *args, **kwargs)
File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/oauth2client/client.py", line 631, in new_request
redirections, connection_type)
File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/httplib2/__init__.py", line 1659, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/httplib2/__init__.py", line 1399, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/httplib2/__init__.py", line 1355, in _conn_request
response = conn.getresponse()
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/gae_override/httplib.py", line 526, in getresponse
raise HTTPException(str(e))
HTTPException: Deadline exceeded while waiting for HTTP response from URL: https://ml.googleapis.com/v1/projects/project-id/models/my-model/versions/v3:predict?alt=json
Now I know that Google App Engine has 60 seconds limit for response, that's why I am doing requests withing taskqueue. I tried also following things:
URLFETCH_DEADLINE = 3600
urlfetch.set_default_fetch_deadline(URLFETCH_DEADLINE)
socket.setdefaulttimeout(URLFETCH_DEADLINE)
I am constructing api client like this
import httplib2
from googleapiclient import discovery
from oauth2client import service_account
credentials = service_account.ServiceAccountCredentials.from_json_keyfile_name('credentials-file', scopes)
http = httplib2.Http(timeout=36000)
http = credentials.authorize(http)
ml = discovery.build('ml', 'v1', http=http)
request = ml.projects().predict(name=predict_ver_name, body=request_data)
It's interesting that sometimes timeout happens around 70s (69.9, 70, 70.1 etc) and sometimes around 120s (119.8, 120.1 etc) , which tells me that this maybe has to do more with some internal Cloud ML dealine. I am doing few tens of requests in parallel through taskqueue. Successful response times are from few seconds to ~110s I'm just curios if somebody had similar experience or can give me advice how to solve this, i.e. what is causing deadlines.
Thanks for posting your experience. - There is some startup cost and depending on the rate of requests it may require to bring up more than one server to serve the need. - What is the size of the model you are trying to predict on? Larger models tend to have larger startup costs.
Thanks.