Search code examples
groovyapache-nifi

Getting Empty Array Using Groovy Script in Nifi


I have an requirement where I need to parse the data into the required format

Input:

{
 "Message" : "\nRecord 1:\nRequired data is missing. \n\nRecord 2:\nprocessing failed\n"
}

Here the content and delimiters are not fixed. The fixed part is only /nRecord keyword on which I am writing the Script. But I am not getting desired Output using Groovy.

desired Output:

[
  {
    "Record 1": "nRequired data is missing"
  },
  {
    "Record 2": "processing failed"
  }
]

I have written Groovy Script for the same but I am getting empty array.

import org.apache.commons.io.IOUtils
import groovy.json.*
import java.util.ArrayList
import java.nio.charset.*
import java.nio.charset.StandardCharsets
import groovy.json.JsonSlurper
import groovy.json.JsonBuilder


def flowFile = session.get()
if(!flowFile) return
try {
flowFile = session.write(flowFile,
{ inputStream, outputStream ->
def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
splitted = text.split('\nRecord')
int j = splitted.size()
final1 = []
for (int i=0;i<j-1;i++)
{
k = "Record " + splitted[i+1]
valid = k.replaceAll("\\n|\"|\\n|}","")
final1.add("{\"" + valid.replaceFirst(":",'":"')+ "\"}" )
}
def json = JsonOutput.toJson(final1)
outputStream.write(JsonOutput.prettyPrint(json).getBytes(StandardCharsets.UTF_8))
} as StreamCallback)

session.transfer(flowFile, REL_SUCCESS)
} catch(Exception e) {
log.error('Error during JSON operations', e)
flowFile = session.putAttribute(flowFile, "error", e.getMessage())
session.transfer(flowFile, REL_FAILURE)
}

Can you please help me with the same.

Thank you.


Solution

  • I would use a regex with a simple trick:

    import groovy.json.*
    
    def json = new JsonSlurper().parseText '{ "Message" : "\nRecord 1:\nRequired data is missing. \n\nRecord 2:\nprocessing failed\nRecord 3:\nprocessing failed badly\n" }'
    
    String msg = json.Message.replaceAll( /\n+(Record \d+:)/, '=$1' ) // THE trick!
    
    List res = ( msg =~ /(?m)(Record \d+):([^=]+)*/ ).collect{ _, rec, text -> [ (rec):text.trim() ] }
    
    assert JsonOutput.toJson( res ) == '[{"Record 1":"Required data is missing."},{"Record 2":"processing failed"},{"Record 3":"processing failed badly"}]'