Search code examples
pythonsslgoogle-cloud-platformproxygoogle-cloud-dlp

how to configure Google Cloud Platform Data Loss Prevention client library for python to work behind a SSL proxy?


I am trying to have Google Cloud Platform Data Loss Prevention (DLP) client library for python working behind a SSL proxy: https://cloud.google.com/dlp/docs/libraries#client-libraries-usage-python

I am using the code snippet from the doc:

# Import the client library
import google.cloud.dlp
import os
import subprocess
import json
import requests
import getpass
import urllib.parse

import logging

logging.basicConfig(level=logging.DEBUG)

# Instantiate a client.
dlp_client = google.cloud.dlp.DlpServiceClient()

# The string to inspect
content = 'Robert Frost'

# Construct the item to inspect.
item = {'value': content}

# The info types to search for in the content. Required.
info_types = [{'name': 'FIRST_NAME'}, {'name': 'LAST_NAME'}]

# The minimum likelihood to constitute a match. Optional.
min_likelihood = 'LIKELIHOOD_UNSPECIFIED'

# The maximum number of findings to report (0 = server maximum). Optional.
max_findings = 0

# Whether to include the matching string in the results. Optional.
include_quote = True

# Construct the configuration dictionary. Keys which are None may
# optionally be omitted entirely.
inspect_config = {
    'info_types': info_types,
    'min_likelihood': min_likelihood,
    'include_quote': include_quote,
    'limits': {'max_findings_per_request': max_findings},
}

# Convert the project id into a full resource id.
parent = dlp_client.project_path('my-project-id')

# Call the API.
response = dlp_client.inspect_content(parent, inspect_config, item)

# Print out the results.
if response.result.findings:
    for finding in response.result.findings:
        try:
            print('Quote: {}'.format(finding.quote))
        except AttributeError:
            pass
        print('Info type: {}'.format(finding.info_type.name))
        # Convert likelihood value to string respresentation.
        likelihood = (google.cloud.dlp.types.Finding.DESCRIPTOR
                      .fields_by_name['likelihood']
                      .enum_type.values_by_number[finding.likelihood]
                      .name)
        print('Likelihood: {}'.format(likelihood))
else:
    print('No findings.')

I also setup the following ENV variable:

GOOGLE_APPLICATION_CREDENTIALS

It run without issue when U am not behind a SSL proxy. When I am working behind a proxy, I am setting up the 3 ENV variables:

REQUESTS_CA_BUNDLE
HTTP_PROXY
HTTPS_PROXY

With such setup other GCP Client python libraries works fine behind a SSL proxy as for example for storage or bigquery).

For the DLP Client python lib, I am getting:

E0920 12:21:49.931000000 24852 src/core/tsi/ssl_transport_security.cc:1229] Handshake failed with fatal error SSL_ERROR_SSL: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed.
DEBUG:google.api_core.retry:Retrying due to 503 Connect Failed, sleeping 0.0s ...
E0920 12:21:50.927000000 24852 src/core/tsi/ssl_transport_security.cc:1229] Handshake failed with fatal error SSL_ERROR_SSL: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed.
DEBUG:google.api_core.retry:Retrying due to 503 Connect Failed, sleeping 0.0s ...

I didn't find in the documentation explaining if the lib works with proxy as the one GCP client lib and how to configure it to works with SSL proxy. The lib is in beta so it could be that it is not yet implemented.

It seems related to CA certificate and handshake. No issue with the same CA for BigQuery and Storage Client python lib. Any idea ?


Solution

  • Summary:

    1. Data Loss Prevention Client libray for python use gRCP. google-cloud-dlp use gRPC while google-cloud-bigquery and google-cloud-storage rely on the requests library for JSON-over-HTTPS. Because it is gRPC other env variable need to be setup:

      GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=path_file.pem  
      # for debugging
      RPC_TRACE=transport_security,tsi  
      GRPC_VERBOSITY=DEBUG
      

      More details and links can be found here link

    2. This doesn't solve all the issues because it continue to fail after the handsake (TLS proxy) as described here link. As well explained by @John Hanley we should enable Private Google Access instead which is the recommended and secure way. This is not yet in place in the network zone I am using the APIs so the proxy team added a SSL bypass and it is now working. I am waiting to have Private Google Access enbale to have a clean and secure setup to use GCP APIs.