Search code examples
jsonsolrapache-nifijolt

How can I get NiFi to take a multi value that is in a database single column to populate Solr index which is an array?


Been writing this question for hours...

I'm thinking that I may need to use the Jolt Transformer processor for this.

Example A single entry in a database column called "A" contains this value in a varchar field. "Alabama","Ohio","Texas","Missouri"

Again the above value is 1 entry in the column "A".

The data needs to populate a Solr Index field called "STATES".

Currently I have an ExecuteSQLRecord pulling the database data with a JsonRecordSetWriter for the Record Writer. This processor connects to my PutSolrContentStream.

The problem seems to be that the data always gets wrapped from column A in quotes " and all" already in the field get escaped.

How can I override this in order to manually set the quotes to be used?

With the way things are working right now the data sent to Solr looks like the following from the query call.

[
 {
  "STATES": "\"Alabama\",\"Ohio\",\"Texas\",\"Missouri\""
 }
]

or other attempts to see what I can get it to output are like

"STATES": "Alabama,Ohio,Texas,Missouri"
"STATES": "'Alabama','Ohio','Texas','Missouri'"

What I need is for the output to be call.

[
 {
  "STATES": ["Alabama","Ohio","Texas","Missouri"]
 }
]

Would the Jolt Transformer be the solution

Examples I've seen with the Jolt have been simple where a specific value gets changed such as STATES becomes "MY_STATES". But I'm at a loss at the moment for how to convert the

"STATES": "Alabama,Ohio,Texas,Missouri"

to

"STATES": ["Alabama","Ohio","Texas","Missouri"]

Solution

  • You just can use a split function within a modify-overwrite-beta transformation such as

    [
      {
        "operation": "modify-overwrite-beta",
        "spec": {
          "STATES": "=split(',',@(1,&))"  // & stands for the current level attribute
        }
      }
    ]
    

    for the input

    {
      "STATES": "Alabama,Ohio,Texas,Missouri"
    }
    

    You can test by putting the Specification and Input into their boxes after toggling the ADVANCED tab illustrated in the following image :

    enter image description here