Search code examples
pentahokettlepentaho-spoonpdidata-masking

How should I perform data masking with pentaho PDI (spoon)?


I would perform data masking for more than 10 tables and each tables has more than 100 columns.

I'd tried to mask data using pentaho PDI tool, but I couldn't find out how should I write mask data with it.

How should I perform data masking with Pentaho? I think one of the way is to use tool named "replace in String" but I couldn't change any string even if I tried to use it.

my question is,

  1. Is it correct way to use "replace in String" in order to do data masking.
  2. if it is correct, how should I fill the value in the respective field?

I want to replace some value with *, let's say, the value is "this is sample value" it should be "txxx xx xxxxx xxxxe" some thing like this.

screen of PDI

please help.


Solution

  • It's not about kettle, it's about regexp. I can confirm that "String Replace" has strange unpredictable behavior, in case of using regex inside this step. There is no explanation of "Replace String" step in official docs as well, not much actually. Anyway u can use RegexEvaluation step to capture needed part and replace inside original string.

    But there is workaround which makes it easier

    enter image description here