Search code examples
python-3.xgoogle-cloud-platformgoogle-cloud-functionsgoogle-cloud-logging

How do I debug this stack trace? (google.auth.transport.grpc.AuthMetadataPlugin)


I've created a function that:

  • runs every minute via cloud scheduler
  • reads a blob as string from a cloud storage bucket
  • writes messages to a pubsub topic

I'm getting connection errors about 5% of the time that seem to reference python site-packages vs my actual code. How can I continue to debug this issue?

I added retries around every step of reading from cloud storage but this failure seems to occur before my code even begins running. Alternately, logs aren't making it to stackdriver?

Here is the full stack trace. I don't see where any of it references lines in my code.

Function execution started
AuthMetadataPluginCallback "<google.auth.transport.grpc.AuthMetadataPlugin object at 0x7ea453f9e780>" raised exception!
Traceback (most recent call last):
  File "/env/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/env/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/opt/python3.7/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/python3.7/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/python3.7/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/python3.7/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/opt/python3.7/lib/python3.7/http/client.py", line 977, in send
    self.sock.sendall(data)
ConnectionResetError: [Errno 104] Connection reset by peer
None
During handling of the above exception, another exception occurred:
None
Traceback (most recent call last):
  File "/env/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/env/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/env/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 400, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/env/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/env/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/env/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/opt/python3.7/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/python3.7/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/python3.7/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/python3.7/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/opt/python3.7/lib/python3.7/http/client.py", line 977, in send
    self.sock.sendall(data)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
None
During handling of the above exception, another exception occurred:
None
Traceback (most recent call last):
  File "/env/local/lib/python3.7/site-packages/google/auth/transport/requests.py", line 123, in __call__
    method, url, data=body, headers=headers, timeout=timeout, **kwargs
  File "/env/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/env/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/env/local/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
None
The above exception was the direct cause of the following exception:
None
Traceback (most recent call last):
  File "/env/local/lib/python3.7/site-packages/google/auth/compute_engine/credentials.py", line 96, in refresh
    self._retrieve_info(request)
  File "/env/local/lib/python3.7/site-packages/google/auth/compute_engine/credentials.py", line 77, in _retrieve_info
    request, service_account=self._service_account_email
  File "/env/local/lib/python3.7/site-packages/google/auth/compute_engine/_metadata.py", line 200, in get_service_account_info
    recursive=True,
  File "/env/local/lib/python3.7/site-packages/google/auth/compute_engine/_metadata.py", line 132, in get
    response = request(url=url, method="GET", headers=_METADATA_HEADERS)
  File "/env/local/lib/python3.7/site-packages/google/auth/transport/requests.py", line 128, in __call__
    six.raise_from(new_exc, caught_exc)
  File "<string>", line 3, in raise_from
google.auth.exceptions.TransportError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
None
The above exception was the direct cause of the following exception:
None
Traceback (most recent call last):
  File "/env/local/lib/python3.7/site-packages/grpc/_plugin_wrapping.py", line 79, in __call__
    callback_state, callback))
  File "/env/local/lib/python3.7/site-packages/google/auth/transport/grpc.py", line 77, in __call__
    callback(self._get_authorization_headers(context), None)
  File "/env/local/lib/python3.7/site-packages/google/auth/transport/grpc.py", line 64, in _get_authorization_headers
    self._request, context.method_name, context.service_url, headers
  File "/env/local/lib/python3.7/site-packages/google/auth/credentials.py", line 124, in before_request
    self.refresh(request)
  File "/env/local/lib/python3.7/site-packages/google/auth/compute_engine/credentials.py", line 102, in refresh
    six.raise_from(new_exc, caught_exc)
  File "<string>", line 3, in raise_from
google.auth.exceptions.RefreshError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

I thought the issue was with blob.download_as_string() getting a connection error.

However, after deploying a simplified version of the function I cannot recreate the error.

This thread says to add ConnectionResetError and ProtocolError as exceptions that will also be retried.

from urllib3.exceptions import ProtocolError
from google.api_core import retry

predicate = retry.if_exception_type(
    ConnectionResetError, ProtocolError)
reset_retry = retry.Retry(predicate)

data = reset_retry(blob.download_as_string)()

I wish I knew why this connection error happens so often.


Solution

  • I've discovered the cause of this intermittent error.

    GCP best practices says to instantiate client connections in your main.py outside of main(). These only execute on instance cold starts.

    For example:

    [main.py] - instantiates clients only during cold start

    import builtins
    from google.cloud import storage
    from google.cloud import pubsub_v1
    from google.cloud import logging as cloudlogging
    
    # Create global clients to avoid unneeded network activity!
    builtins.pubsub_client = pubsub_v1.PublisherClient()
    builtins.storage_client = storage.Client()
    builtins.log_client = cloudlogging.Client()
    

    [other_func.py] - uses clients

    bucket = storage_client.create_bucket(bucket_name)
    

    Examples relevant to networking

    Examples relevant to logging