Search code examples
javagoogle-cloud-platformgoogle-bigqueryapache-beamgoogle-cloud-dlp

Google Cloud DLP Api InspectResult


Good day!

I'm using cloud dlp api to inspect bigquery views by converting chunks of the data into ContentItem and passing it to the inspect request. However, I am having trouble converting the findings and saving it to a bigquery table. Before, I used an airflow DLP operator for this and it is being done automatically by passing output storage config in an InspectConfig. However, that approach won't be applicable anymore as I'm calling the DLP api per chunks of data using apache beam in java.

I saw that the finding object has a writeTo() method but I'm not sure how to use it and how to save the findings with correct types into a bigquery table. can you help me with this? I'm currently stuck. thank you!

what I want to do is something like this

for (Finding res : result.getFindingsList()){
        TableRow bqRow = new TableRow();
        Object data = res.getLocation();
        bqRow.set("field", data);
        context.output(bqRow);
}

but this approach wouldn't save it in bigquery with correct types, especially for getLocation as it returns something like a protobuf message type.

I was trying to see if I can use the writeTo() method but I'm not sure how to use it. Thank you in advance for the help!

for (Finding res : result.getFindingsList()){
        res.writeTo(...)
        ...
        context.output(...);
}

Solution

  • If you use HybridInspect we'll store the findings for you to BigQuery.

    https://cloud.google.com/dlp/docs/how-to-hybrid-jobs

    If you do it yourself you will need to convert to a native BQ format like json

    Load protobuf data to bigquery