Search code examples

Requests to Google Cloud ML timeout

I'm doing requests (online prediction) from Google App Engine to Google Cloud ML (I didn't create model) and from time to time I get exception "Deadline exceeded while waiting for HTTP response from URL" full trace here:

    Deadline exceeded while waiting for HTTP response from URL: (/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/
Traceback (most recent call last):
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/", line 1535, in __call__
    rv = self.handle_exception(request, response, e)
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/", line 1529, in __call__
    rv = self.router.dispatch(request, response)
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/", line 1278, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/", line 1102, in __call__
    return handler.dispatch()
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/", line 572, in dispatch
    return self.handle_exception(e,
  File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/", line 570, in dispatch
    return method(*args, **kwargs)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/", line 90, in post
    response = predict(batch_obj=batch_data_obj)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/", line 88, in predict
    response = request.execute()
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/oauth2client/", line 135, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/googleapiclient/", line 835, in execute
    method=str(self.method), body=self.body, headers=self.headers)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/googleapiclient/", line 162, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/oauth2client/", line 631, in new_request
    redirections, connection_type)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/httplib2/", line 1659, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/httplib2/", line 1399, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/base/data/home/apps/s~project-id/1.402312581449917691/lib/httplib2/", line 1355, in _conn_request
    response = conn.getresponse()
  File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/gae_override/", line 526, in getresponse
    raise HTTPException(str(e))
HTTPException: Deadline exceeded while waiting for HTTP response from URL:

Now I know that Google App Engine has 60 seconds limit for response, that's why I am doing requests withing taskqueue. I tried also following things:


I am constructing api client like this

import httplib2
from googleapiclient import discovery
from oauth2client import service_account

credentials = service_account.ServiceAccountCredentials.from_json_keyfile_name('credentials-file', scopes)
http = httplib2.Http(timeout=36000)
http = credentials.authorize(http)

ml ='ml', 'v1', http=http)
request = ml.projects().predict(name=predict_ver_name, body=request_data)

It's interesting that sometimes timeout happens around 70s (69.9, 70, 70.1 etc) and sometimes around 120s (119.8, 120.1 etc) , which tells me that this maybe has to do more with some internal Cloud ML dealine. I am doing few tens of requests in parallel through taskqueue. Successful response times are from few seconds to ~110s I'm just curios if somebody had similar experience or can give me advice how to solve this, i.e. what is causing deadlines.


  • Thanks for posting your experience. - There is some startup cost and depending on the rate of requests it may require to bring up more than one server to serve the need. - What is the size of the model you are trying to predict on? Larger models tend to have larger startup costs.
