Please help me to keep only alpha numeric chars in string,
Refered the below post, but I don't want remove each special chars instead keep only alpha numeric. using split and join for each special chars, we need to write more number of lines of code. is there any other way to keep only alpha numeric.
Remove escape characters from string using JOLT
input :
{
"name": ",ra@~vi7^!~);\"8%." // here any special char can be possible.
}
expected output :
{
"name": "ravi78"
}
I have resolved a similar problem here:
Jolt Transformation for replacing special characters in a string
We can use the same pattern of joining two char arrays: One for the input string being converted into array, and another for the action array. In the above link the action was to match and replace. In your case the action is just to match.
The pattern works as the following:
Spec 1- modify-overwrite-beta: - Create char array from the input - Create the valid char dataset (alphanumeric in this case)
Spec 2- shift: - Loop each name char array and create an object with the char as the object key, store the char index in the matchedIndex array to preserve the order. - Loop each alphanumeric array and create an object with the char as the key and add isLegal flag field. The result of the intersection between name char array and alphanumeric array will have the two fields: matchedIndex and isLegal.
Spec 3-shift:
For each object containing both matchedIndex and isLegal fields store the object char key in a name array using the same index stored in the matchedIndex array. The new array will have nulls since matchedIndex might have gaps depending on when the first valid char starts and for any removed invalid char.
Spec 4-shift:
squahNulls from the name array then join it into a string
Here is the final spec:
[
{
"operation": "modify-overwrite-beta",
"spec": {
"nameArray": "=split('',@(1,name))",
"legalAlphaSmall": "abcdefghijklmnopqrstuvwxyz",
"legalAlphaCaps": "=toUpper(@(1,legalAlphaSmall))",
"legalNumeric": "0123456789",
"legalCharStr": "=concat(@(1,legalAlphaSmall),@(1,legalAlphaCaps),@(1,legalNumeric))",
"legalCharArray": "=split('',@(1,legalCharStr))"
}
},
{
"operation": "shift",
"spec": {
"nameArray": {
"*": {
"$": "@0.matchIndex[]"
}
},
"legalCharArray": {
"*": {
"#1": "@0.isLegal"
}
}
}
},
{
"operation": "shift",
"spec": {
"*": {
"isLegal": {
"@1,matchIndex": {
"*": {
"*": {
"$4": "name[&1]"
}
}
}
}
}
}
},
{
"operation": "modify-overwrite-beta",
"spec": {
"*": "=squashNulls",
"name": "=join('',@(1,&))"
}
}
]
Having resolved this, I'm not sure if Jolt is the proper mean to do such thing at least not until jolt is capable of doing regex functions. I think this should be handled out of Jolt, for example if you are using nifi , then you can simply use UpdateRecord processor , or combination of EvaluateJsonPath, UpdateAttribute and simple Jolt to set the final value. My concern with such spec is the performance specially when you are dealing with large data and\or complex patterns. What do you think @BarbarosÖzhan?