Search code examples
google-cloud-dlp

Custom info type and hotword rule


Trying to use customInfoType combined with the hotwordRule. The configuration looks like this (taken from nodeJS implementation):

custom info type:

const customConfig = [{
    infoType: {
      name: 'CZECH_ID'
    },
    regex: {
      pattern: /[0-9]{2,6}-?[0-9]{2,10}\/[0-9]{4}/
    },
    likelihood: 'POSSIBLE'
  }];

custom rule set:

const customRuleSet = [{
    infoTypes: [{ name: 'CZECH_ID' }],
    rules: [
      {
        hotwordRule: {
          hotwordRegex: {
            pattern: /^CZID$/
          }
        },
        proximity: {
          windowBefore: 10,
          windowAfter: 0
        }
      }
    ]
  }]

and here the inspectConfig:

const request = {
    parent: `projects/${projectId}/locations/global`,
    inspectConfig: {
      infoTypes: infoTypes,
      customInfoTypes: customConfig,
      ruleSet: customRuleSet,
      minLikelihood: 'POSSIBLE',
      limits: {
        maxFindingsPerRequest: maxFindings,
      },
      includeQuote: true,
    },
    item: item,
  };

When running it I get:

Error: 3 INVALID_ARGUMENT: `window_before` and `window_after` cannot both be 0.

When I remove the customeRuleSet from the run configuration it passes, without identifying the string though. So it has to do something with the proximity section, not sure what is wrong though.


Solution

  • Your json looks off, you aren't including proximity inside of the hotword rule.

    hotword_rule = {
            "hotword_regex": {"pattern": "/^CZID$/"},
            "likelihood_adjustment": {
                "fixed_likelihood": google.cloud.dlp_v2.Likelihood.VERY_LIKELY
            },
            "proximity": {"window_before": 10},
        }
    
        rule_set = [
            {"info_types": [{"name": "CZECH_ID"}], "rules": [{"hotword_rule": hotword_rule}]}
        ]
    

    There is a python example here

    https://cloud.google.com/dlp/docs/creating-custom-infotypes-likelihood#dlp_inspect_hotword_rule-python