Search code examples
apache-nifi

Apache Nifi Remove non-alphanumeric characters in only one attribute in Json


I hope someone will be able to help me. I am trying to learn Apache Nifi by doing some project where I have json files in the following format:

{
  "network": "reddit",
  "posted": "2021-12-24 10:46:51 +00000",
  "postid": "rnjv0z",
  "title": "A gil commission artwork of my friends who are in-game couples!",
  "text": "A gil commission artwork of my friends who are in-game couples! ",
  "lang": "en",
  "type": "status",
  "sentiment": "neutral",
  "image": "https://a.thumbs.redditmedia.com/ShKq9bu4_ZIo4k5QIBYotstmyGidRgn8046RcqPo_p0.jpg",
  "url": "http://www.reddit.com/r/ffxiv/comments/rnjv0z/a_gil_commission_artwork_of_my_friends_who_are/",
  "user": {
    "userid": "Suhteeven",
    "name": "Suhteeven",
    "url": "http://www.reddit.com/user/Suhteeven"
  },
  "popularity": [
    {
      "name": "ups",
      "count": 1
    },
    {
      "name": "comments",
      "count": 0
    }
  ]
}

I want to remove all non-alphanumeric characters from "text" attribute. I want only this one attribute to be modified, while the rest of the filename remains the same.

I tried using EvaluateJsonPath processor where I added text attribute. Then I created ReplaceText processor. enter image description here

enter image description here

This configuration cleaned special characters from the text but as a result I have only value from text attribute. I don't want to loose other information, my goal is to have all attributes in the output with text attribute's value modified. enter image description here

I tried also UpdateAttribute processor but this processor didn't do anything with my json (output is the same as input). enter image description here

Can you please tell me what processors I should use with what configurations? I tried many different things but I am stucked.


Solution

  • It's possible with a processor ScriptedTransformProcessor

    • Record Reader: JsonTreeReader

    • Record Writer: JsonRecordSetWriter

    • Script Language (default): Groovy

    • Script Body

    record.setValue("text", attributes['text'])
    record
    

    Data flow: EvaluateJsonPath (evaluate text attribute) -> UpdateAttribute (modify text attribute) -> ScriptedTransformProcessor (add text to record)