Search code examples
headerapache-nifitrimprocessor

NiFi: Remove fixed number of header lines from file


I'm processing a file and I'd like to remove (trim) the first X header lines to keep only data, possibly avoiding using regular expressions.

Thanks


Solution

  • You can remove the first X header lines by using ExecuteScript procesor in Nifi.

    The following is a example Jython script which I wrote for myself:

    import json
    import java.io
    from org.apache.commons.io import IOUtils
    from java.nio.charset import StandardCharsets
    from org.apache.nifi.processor.io import StreamCallback
    
    class PyStreamCallback(StreamCallback):
      def __init__(self):
            pass
      def process(self, inputStream, outputStream):
        text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8)
        for line in text[3:]:
            outputStream.write(line + "\n") 
    
    flowFile = session.get()
    if (flowFile != None):
      flowFile = session.write(flowFile,PyStreamCallback())
      flowFile = session.putAttribute(flowFile, "filename", flowFile.getAttribute('filename').split('.')[0]+'_translated.json')
      session.transfer(flowFile, REL_SUCCESS)
    

    This obviously removes the first 3 lines but you can easily modify it to remove more or less lines.

    Hope that helps.