I am trying to have Google Cloud Platform Data Loss Prevention (DLP) client library for python working behind a SSL proxy: https://cloud.google.com/dlp/docs/libraries#client-libraries-usage-python
I am using the code snippet from the doc:
# Import the client library
import google.cloud.dlp
import os
import subprocess
import json
import requests
import getpass
import urllib.parse
import logging
logging.basicConfig(level=logging.DEBUG)
# Instantiate a client.
dlp_client = google.cloud.dlp.DlpServiceClient()
# The string to inspect
content = 'Robert Frost'
# Construct the item to inspect.
item = {'value': content}
# The info types to search for in the content. Required.
info_types = [{'name': 'FIRST_NAME'}, {'name': 'LAST_NAME'}]
# The minimum likelihood to constitute a match. Optional.
min_likelihood = 'LIKELIHOOD_UNSPECIFIED'
# The maximum number of findings to report (0 = server maximum). Optional.
max_findings = 0
# Whether to include the matching string in the results. Optional.
include_quote = True
# Construct the configuration dictionary. Keys which are None may
# optionally be omitted entirely.
inspect_config = {
'info_types': info_types,
'min_likelihood': min_likelihood,
'include_quote': include_quote,
'limits': {'max_findings_per_request': max_findings},
}
# Convert the project id into a full resource id.
parent = dlp_client.project_path('my-project-id')
# Call the API.
response = dlp_client.inspect_content(parent, inspect_config, item)
# Print out the results.
if response.result.findings:
for finding in response.result.findings:
try:
print('Quote: {}'.format(finding.quote))
except AttributeError:
pass
print('Info type: {}'.format(finding.info_type.name))
# Convert likelihood value to string respresentation.
likelihood = (google.cloud.dlp.types.Finding.DESCRIPTOR
.fields_by_name['likelihood']
.enum_type.values_by_number[finding.likelihood]
.name)
print('Likelihood: {}'.format(likelihood))
else:
print('No findings.')
I also setup the following ENV variable:
GOOGLE_APPLICATION_CREDENTIALS
It run without issue when U am not behind a SSL proxy. When I am working behind a proxy, I am setting up the 3 ENV variables:
REQUESTS_CA_BUNDLE
HTTP_PROXY
HTTPS_PROXY
With such setup other GCP Client python libraries works fine behind a SSL proxy as for example for storage or bigquery).
For the DLP Client python lib, I am getting:
E0920 12:21:49.931000000 24852 src/core/tsi/ssl_transport_security.cc:1229] Handshake failed with fatal error SSL_ERROR_SSL: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed.
DEBUG:google.api_core.retry:Retrying due to 503 Connect Failed, sleeping 0.0s ...
E0920 12:21:50.927000000 24852 src/core/tsi/ssl_transport_security.cc:1229] Handshake failed with fatal error SSL_ERROR_SSL: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed.
DEBUG:google.api_core.retry:Retrying due to 503 Connect Failed, sleeping 0.0s ...
I didn't find in the documentation explaining if the lib works with proxy as the one GCP client lib and how to configure it to works with SSL proxy. The lib is in beta so it could be that it is not yet implemented.
It seems related to CA certificate and handshake. No issue with the same CA for BigQuery and Storage Client python lib. Any idea ?
Summary:
Data Loss Prevention Client libray for python use gRCP. google-cloud-dlp use gRPC while google-cloud-bigquery and google-cloud-storage rely on the requests library for JSON-over-HTTPS. Because it is gRPC other env variable need to be setup:
GRPC_DEFAULT_SSL_ROOTS_FILE_PATH=path_file.pem
# for debugging
RPC_TRACE=transport_security,tsi
GRPC_VERBOSITY=DEBUG
More details and links can be found here link