Search code examples
jsonapache-nifiprocessorapache-calcite

in Nifi and with QueryRecord processor can we add a new column that is a regex of another column


in Nifi and with QueryRecord processor can we add a new column that is a regex of another column in a query?

Like : SELECT info, SUBSTRING(info, "([^\s]+)") as f_name FROM FLOWFILE

I don't want to split my flowfile, ExtractText, UpdateAttributes, then attributesToJson and MergeContent. seems like along loop and it will take time if we talking about 400MB per FlowFile with 100k+ rows each

input:

{"info":"Rachel: %Robert-100-400-4444-Mrs"}
{"info":": %Martin-200-500-5555-Mr"}
{"info":"%Holand-300-600-6666-Mr"}

desired output:

{"info":"Rachel: %Robert-100-400-4444-Mrs", "f_name":"Rachel","l_name":"Robert","area_code":100,"last_four_digit":4444,"title":"Mrs"}
{"info":": %Martin-200-500-5555-Mr", "f_name":"","l_name":"Martin","area_code":200,"last_four_digit":5555,"title":"Mr"}
{"info":"%Holand-300-600-6666-Mr", "f_name":"","l_name":"Holand","area_code":300,"last_four_digit":6666,"title":"Mr"}

Solution

  • QueryRecord will allow you to filter the records in the flowfile using a regex via LIKE (one of the examples on the Additional Details page of the docs), but to update the record, you'll need to use UpdateRecord.

    UpdateRecord uses the RecordPath DSL syntax, which also has regex functions like replaceRegex and matchesRegex.