google-app-engine google-api-python-client google-cloud-ml

Requests to Google Cloud ML timeout

I'm doing requests (online prediction) from Google App Engine to Google Cloud ML (I didn't create model) and from time to time I get exception "Deadline exceeded while waiting for HTTP response from URL" full trace here:

    Deadline exceeded while waiting for HTTP response from URL: https://ml.googleapis.com/v1/projects/project-id/models/my-model/versions/v3:predict?alt=json (/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py:1552)
Traceback (most recent call last):
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1535, in __call__
    rv = self.handle_exception(request, response, e)
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1529, in __call__
    rv = self.router.dispatch(request, response)
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 1102, in __call__
    return handler.dispatch()
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 572, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
    return method(*args, **kwargs)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/main.py", line 90, in post
    response = predict(batch_obj=batch_data_obj)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/run_cloud_predict.py", line 88, in predict
    response = request.execute()
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/oauth2client/util.py", line 135, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/googleapiclient/http.py", line 835, in execute
    method=str(self.method), body=self.body, headers=self.headers)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/googleapiclient/http.py", line 162, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/oauth2client/client.py", line 631, in new_request
    redirections, connection_type)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/httplib2/__init__.py", line 1659, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/httplib2/__init__.py", line 1399, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/httplib2/__init__.py", line 1355, in _conn_request
    response = conn.getresponse()
  File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/gae_override/httplib.py", line 526, in getresponse
    raise HTTPException(str(e))
HTTPException: Deadline exceeded while waiting for HTTP response from URL: https://ml.googleapis.com/v1/projects/project-id/models/my-model/versions/v3:predict?alt=json

Now I know that Google App Engine has 60 seconds limit for response, that's why I am doing requests withing taskqueue. I tried also following things:

URLFETCH_DEADLINE = 3600
urlfetch.set_default_fetch_deadline(URLFETCH_DEADLINE)
socket.setdefaulttimeout(URLFETCH_DEADLINE)

I am constructing api client like this

import httplib2
from googleapiclient import discovery
from oauth2client import service_account

credentials = service_account.ServiceAccountCredentials.from_json_keyfile_name('credentials-file', scopes)
http = httplib2.Http(timeout=36000)
http = credentials.authorize(http)

ml = discovery.build('ml', 'v1', http=http)
request = ml.projects().predict(name=predict_ver_name, body=request_data)

It's interesting that sometimes timeout happens around 70s (69.9, 70, 70.1 etc) and sometimes around 120s (119.8, 120.1 etc) , which tells me that this maybe has to do more with some internal Cloud ML dealine. I am doing few tens of requests in parallel through taskqueue. Successful response times are from few seconds to ~110s I'm just curios if somebody had similar experience or can give me advice how to solve this, i.e. what is causing deadlines.

Solution

Thanks for posting your experience. - There is some startup cost and depending on the rate of requests it may require to bring up more than one server to serve the need. - What is the size of the model you are trying to predict on? Larger models tend to have larger startup costs.

Thanks.