Search code examples
python-3.xgoogle-api-python-clientmwaa

MWAA: Timeout on googleapiclient Connection


We are experiencing timeout issue when connecting to Google Sheets, specifically using googleapiclient. The code has been working, but after some new deployment, we start getting this error. Even we roll back the changes, this error still persists.

We setup airflow running on MWAA Airflow 2.6.3, and build dependencies with python WHL file. We tried installing requirements from Python Package Index but it got timeout error WARNING: requirements.txt installation timed out after 9 minutes. Some requirements may not have installed. and DAGs are broken.

Airflow is able to connect to other 3rd party services (Jira, other services, etc.), but DAGs connecting to Google Sheet API are having issues.

Please share any solution or possible place we can look to resolve the issue. Thanks.

Code Snippet

from googleapiclient.discovery import build

service = getattr(build(
    serviceName='sheets',
    version='v4',
    credentials=<credentials>), spreadsheets)()
service.get(spreadsheetId=<spreadsheet_id>).execute()

And we get following stack trace

Traceback (most recent call last):
  File "/usr/local/airflow/dags/common/spreadsheet.py", line 199, in get_spreadsheet
    return service.get(spreadsheetId=self._id).execute()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 923, in execute
    resp, content = _retry_request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 222, in _retry_request
    raise exception
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/googleapiclient/http.py", line 191, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google_auth_httplib2.py", line 209, in request
    self.credentials.before_request(self._request, method, uri, request_headers)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/auth/credentials.py", line 151, in before_request
    self.refresh(request)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/service_account.py", line 434, in refresh
    access_token, expiry, _ = _client.jwt_grant(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 312, in jwt_grant
    response_data = _token_endpoint_request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 272, in _token_endpoint_request
    response_status_ok, response_data, retryable_error = _token_endpoint_request_no_throw(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 219, in _token_endpoint_request_no_throw
    request_succeeded, response_data, retryable_error = _perform_request()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google/oauth2/_client.py", line 195, in _perform_request
    response = request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/google_auth_httplib2.py", line 119, in __call__
    response, data = self.http.request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1724, in request
    (response, content) = self._request(
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1444, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1366, in _conn_request
    conn.connect()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/httplib2/__init__.py", line 1156, in connect
    sock.connect((self.host, self.port))
TimeoutError: timed out
Configurations:

MWAA: Airflow 2.6.3

Installed Packages (Using plugins.zip):
 - Levenshtein-0.21.1
 - PyGithub-1.59.0
 - adtk-0.6.2
 - apache-airflow-providers-atlassian-jira-2.1.1
 - apache-airflow-providers-github-2.3.1
 - apache-airflow-providers-mysql-5.1.1
 - apache-airflow-providers-snowflake-4.2.0
 - asttokens-2.2.1
 - atlassian-python-api-3.39.0
 - aws-requests-auth-0.4.3
 - backcall-0.2.0
 - cachetools-5.3.1
 - comm-0.2.2
 - cycler-0.12.1
 - debugpy-1.8.1
 - defusedxml-0.7.1
 - executing-1.2.0
 - fonttools-4.50.0
 - google-api-core-2.11.0
 - google-api-python-client-2.92.0
 - google-auth-2.21.0
 - google-auth-httplib2-0.1.0
 - googleapis-common-protos-1.59.1
 - gql-3.3.0
 - graphql-core-3.2.3
 - httplib2-0.22.0
 - iniconfig-2.0.0
 - ipykernel-6.25.1
 - ipython-8.14.0
 - jedi-0.18.2
 - jira-3.5.2
 - joblib-1.3.2
 - jupyter-client-8.3.0
 - jupyter-core-5.3.1
 - kiwisolver-1.4.5
 - matplotlib-3.5.2
 - matplotlib-inline-0.1.6
 - mpld3-0.5.9
 - mysqlclient-2.2.0
 - nest-asyncio-1.6.0
 - numpy-1.24.4
 - oauthlib-3.2.2
 - oscrypto-1.3.0
 - pandas-1.5.3
 - parso-0.8.3
 - patsy-0.5.6
 - pickleshare-0.7.5
 - pillow-10.2.0
 - playwright-1.37.0
 - protobuf-4.23.4
 - pure-eval-0.2.2
 - py-1.11.0
 - pyOpenSSL-23.2.0
 - pyasn1-0.4.8
 - pyasn1-modules-0.2.8
 - pycryptodomex-3.18.0
 - pyee-9.0.4
 - pynacl-1.5.0
 - pypika-0.48.9
 - pytest-7.4.0
 - python-Levenshtein-0.21.1
 - pyzmq-25.1.0
 - requests-oauthlib-1.3.1
 - retry-0.9.2
 - rsa-4.9
 - scikit-learn-1.3.0
 - scipy-1.12.0
 - snowflake-connector-python-3.0.4
 - snowflake-sqlalchemy-1.4.7
 - sortedcontainers-2.4.0
 - sql-formatter-0.6.2
 - stack-data-0.6.2
 - statsmodels-0.14.1
 - thefuzz-0.20.0
 - threadpoolctl-3.4.0
 - traitlets-5.9.0
 - uritemplate-4.1.1

Solution

  • For anyone came here.

    After lots of try-and-error, eventually we found the issue with IPv6 on network interacting with the Google API packages (per this answer https://stackoverflow.com/a/75375184/15938510) We removed the IPv6 on the AWS network, and now the code is working normally.