I'm scanning a nested directory in a cloud storage bucket. The result doesn't contain the matched value (quote) although I have the include_quote on. Also, how do I get the name of the files that have the matching along with the matched values? I'm using Python. This is what I have so far. As you can see, the API found matching, but I'm not getting the details on which words (and the files) were flagged.
inspect_job = {
'inspect_config': {
'info_types': info_types,
'min_likelihood': MIN_LIKELIHOOD,
'include_quote': True,
'limits': {
'max_findings_per_request': MAX_FINDINGS
},
},
'storage_config': {
'cloud_storage_options': {
'file_set': {
'url':
'gs://{bucket_name}/{dir_name}/**'.format(
bucket_name=STAGING_BUCKET, dir_name=DIR_NAME)
}
}
}
operation = dlp.create_dlp_job(parent, inspect_job)
dlp.get_dlp_job(operation.name)
Here is the result:
result {
processed_bytes: 64
total_estimated_bytes: 64
info_type_stats {
info_type {
name: "EMAIL_ADDRESS"
}
count: 1
}
info_type_stats {
info_type {
name: "PHONE_NUMBER"
}
count: 1
}
info_type_stats {
info_type {
name: "FIRST_NAME"
}
count: 2
}
You need to follow "Retrieving inspection results" section in https://cloud.google.com/dlp/docs/inspecting-storage and specify save findings action https://cloud.google.com/dlp/docs/reference/rest/v2/InspectJobConfig#SaveFindings