Search code examples
jsonapache-nifijolt

Jolt Transformation for replacing special characters in a string


I need to write jolt spec for this transformation Input :

[
  {
    "value": "My name is-Yash:Jain",
    "encoding": [
      {
        "enc": " ",
        "dec": "space"
      },
      {
        "enc": "-",
        "dec": "minus"
      },
      {
        "enc": ":",
        "dec": "colon"
      }
    ]
  }
]

Output should be

{
  "value": "MyspacenamespaceisminusYashcolonJain"
}

Basically I need to loop through entire list of encoding and replace every instances of enc in value with dec.

I've already tried with the following transformation :

[
  {
    "operation": "modify-overwrite-beta",
    "spec": {
      "*": {
        "t1": "=split(' ',@(1,value))",
        "t2": "=join('space',@(1,t1))",
        "t3": "=split('-',@(1,t2))",
        "t4": "=join('minus',@(1,t3))",
        "t5": "=split(':',@(1,t4))",
        "t6": "=join('colon',@(1,t5))"
      }
    }
  },
  {
    "operation": "shift",
    "spec": {
      "*": {
        "t6": "value"
      }
    }
  }
]

Solution

  • Its good to be back- even for short period- doing jolt :). I might be rusty so Im not sure if this the best solution out there, but I was able to make it work dynamically using the below spec. Its kind of hard to explain and the comments might be confusing, that is why I recommend running each operation one by one at first to understand the steps the spec goes through to get the desired transformation.

    [
      //split the value string into array of characters
      {
        "operation": "modify-overwrite-beta",
        "spec": {
          "*": {
            "value": "=split('',@(1,value))"
          }
        }
      },
        // This is the magic operation, it basically groups unique characters
      //from the enc values and from the array above. each
      // unique character will have placeholder values and indexes where there is
      // a match from the value array char. based on how the shift is organized, 
      // the placeholder at index 0 is the correct placeholder value.
      {
        "operation": "shift",
        "spec": {
          "*": {
            "encoding": {
              "*": {
                "dec": "[&3].@1,enc.placeholder[]"
              }
            },
            "value": {
              "*": {
                "@": "[&3].@1.placeholder[]",
                "$": "[&3].@1.indexes[]"
              }
            }
          }
        }
      },
      //loop through each unique character from above, then loop through each 
      //matched index and create new array where the placeholder[0] is placed
      //at each matched index so that we can maintain the same order 
      {
        "operation": "shift",
        "spec": {
          "*": {
            "*": {
              "indexes": {
                "*": {
                  "@2,placeholder[0]": "[&4].value[@1]"
                }
              }
            }
          }
        }
      },
      // join the value array from above to get the final value string
      {
        "operation": "modify-overwrite-beta",
        "spec": {
          "*": {
            "value": "=join('',@(1,value))"
          }
        }
      }
    ]
    

    @Barbaros Özhan , please feel free to add or modify\change my spec as you see suitable.