Search code examples
regexazureazure-purview

How to create column name pattern matching for data classification in Azure Purview?


All I'm trying to do is simply classify a field as "Date of Birth" if the column name contains any of the following:

  • DateofBirth
  • BirthDate
  • DOB
  • YMDBIRTH

I'm not a huge RegEx user, but I usually can figure it out with a few googles. I have tried all of the following in a Custom Data Classification Rule:

DateofBirth|BirthDate|DOB|YMDBIRTH

/DateofBirth/|/BirthDate/|/DOB/|/YMDBIRTH/

.*DateOfBirth.*|.*BirthDate.*|.*DOB.*|.*YMDBIRTH.*

/.*DateOfBirth.*|.*BirthDate.*|.*DOB.*|.*YMDBIRTH.*/i

None of these have appeared to work... I'm beginning to think it has something to do with my scans... is there some sort of Lag?

I even just used YMDBIRTH on the Classification rule and it still didn't classify the column after the scan completed.

According to this Microsoft Document I think the very first method I have documented here should have worked "DateofBirth|BirthDate|DOB|YMDBIRTH"

https://learn.microsoft.com/en-us/azure/purview/create-a-custom-classification-and-classification-rule

According to the document it says:

Optionally, if the data usually is in a column that they know the name of, such as Employee_ID or EmployeeID, they can add a column pattern regular expression to make the scan even more accurate. An example regex is Employee_ID|EmployeeID

So using this, I would think should work: Classification Rule Screenshot


Solution

  • I was unaware of this, but in the scan rule sets, if you create a new custom rule that you believe replaces a System Rule and you uncheck the system rule. It appears that the Scan rule will not apply your custom rule...

    For me I had Date of Birth system rule unchecked, after checking it (along with my custom rule) it worked and properly classified the column

    See Screenshot: Selected Classification Rules

    Also, my first method of classification DateofBirth|BirthDate|DOB|YMDBIRTH was correct, and worked just fine after this.