Search code examples
google-cloud-platformdata-loss

GCP data Loss Prevention - not detecting local types


I am working with GCP's DLP APi, and I have issues detecting country-specific types. On the other hand, I have no issues with global types (here you can find the list of types). Does anyone have suggestions on how to fix this? In case it might help, I'm working from outside the US.

This is a copy of my config file:

info_types_rep_names = {"PHONE_NUMBER": "[PHONE]",
                        "EMAIL_ADDRESS": "[EMAIL]",
                        "US_PASSPORT": "[PASSPORT]",}

info_types = [{"name": key} for key, value in info_types_rep_names.items()]

deidentify_config = {
    "info_type_transformations": {
        "transformations": [ 
            {
                "info_types" : [{"name": key}],
                "primitive_transformation": {
                    "replace_config": {
                        "new_value": {"string_value": value}
                    }
                }
            } for key, value in info_types_rep_names.items()
        ]
    }
}

Solution

  • Locations might be affected in some scenarios. Refer to this document for Country specific values.

    There is a piece of code and I could see that there might be a chance that the provided value may be wrong. I have tested from our end with sample code using Python refer document and for using info_types_rep_names, by changing the project id “input_str = 'Please call me. My phone number is. My email, just in case, is [email protected]. Take a note of US passport number: C03005988. Or maybe C03004786'” and works for US_PASSPORT, but the sample number should absolutely work for sure to be valid. Also there may be a possibility that only a country specific one may fail. But when tested with some other country specific value and it worked. Also I could see all the results are aligned with the demo detection.

    Also making sure the right InfoTypes are in use on the right section.follow the link , and in the options tab you can view and adjust the InfoTypes.

    So my first question is , is it getting aligned with correct Infotypes or not?

    If it is aligned, then it is either our detection fails with specific samples or might be the sample is not valid.

    If it is not valid then it is either the matter of having the wrong sample or a bug in the code. Please check and revert if there are any issues.