Search code examples
base64apache-nifi

Include base64 code of image in csv file using Nifi


I have json array response from InvokeHTTP. I am using the below Flow to convert some json info to csv. One of the json info is id which is used to get image and then convert it to base64. I need to add this base64 code to my csv. I don't understand how to save it in an attribute so that it can be put in AttributeToCsv. Nifi Flowfile diagram to convert json response to csv

Also, I was reading here https://community.cloudera.com/t5/Support-Questions/Nifi-attribute-containing-large-text-value/td-p/190513 that it is not recommended to store large values in attributes due to memory concern. What would be an optimal approach in this scenario.

Json response during first call:

[ {
  "fileNumber" : "1",
   "uuid" : "abc",
  "attachedFiles" : [ {
    "id" : "bkjdbkjdsf",
    "name" : "image1.png",
  }, {
    "id" : "xzcv",
    "name" : "image2.png",
  } ],
  "date":null
  },
  { "fileNumber" : "2",
   "uuid" : "def",
  "attachedFiles" : [],
  "date":null
  }]

Final Csv (after merge or expected output):

Id,File Name, File Data(base64 code)
bkjdbkjdsf,image1.png, iVBORw0KGgo...ji
xzcv,image1.png,ZEStWRGau..74

My approach (will change as per suggestions): After splitting Json response, I use EvaluateJsonPath to get "attachedFiles". I find length of array "attachedFiles" and then decide if need to split further if 2 or more files are there. If 0 then do nothing. In second EvaluateJsonPath I add properties Id,File Name and set the values from json using $.id etc.. I use the Id to invoke other URL which I encode to Base64.

Current output - csv file which needs to be updated with third column File Data(base64 code) and it's value:

Id,File Name
bkjdbkjdsf,image1.png
xzcv,image1.png

Solution

  • as a variant use ExecuteGroovyScript:

    def ff=session.get()
    if(!ff)return
    
    ff.write{sin, sout->
        sout.withWriter('UTF-8'){w->
            //write attribute values for names 'Id' and 'filename' delimited with coma
            w << ff.attributes.with{a->[a.'Id', a.'filaname']}.join(',')
            w << ',' //wtite coma
            
            //sin.withReader('UTF-8'){r-> w << r} //write current content of the file after last coma
            w << sin.bytes.encodeBase64()
            w << '\n'
        }
    }
    REL_SUCCESS << ff
    

    UPD: i put sin.bytes.encodeBase64() instead of copying flowfile content. this one creates one-line base64 string for input file. if you are using this option - you should remove Base64EncodeContent to prevent double base64 encoding.