Search code examples
csvapache-nifidata-processing

Apache NiFi: Mapping a csv with multiple columns to create new rows


I found a similar question on stack overflow. This approach worked fine with just a couple of columns But I realised this method is not possible for csv's with a large number of Columns.

I have a csv with 75 columns. I decided to follow this approach (Same link as mentioned above). As asked to do in that question. I added the UpdateRecord processor and added the CSVReader and CSVWriter. Then as told I entered my SchemaText. Which was pretty long as it required me to define the entire 70 columns. Then CSVRecordSetWriter was told to be invalid.

I realised after a certain number of column definitions I included in the schema it became invalid.

Part of my schema looks like this:

{
   "type":"record",
   "name":"test2.csv",
   "namespace":"my.namespace",
   "fields":[
      {
         "name":"download",
         "type":"string"
      },
      {
         "name":"upload",
         "type":"string"
      }
      .
      .
      .
      .
      {
         "name":"operatorId",
         "type":"string"
      },
      {
         "name":"errorCode",
         "type":"string"
      }      
   ]
}

Also my csv contains headers.

Objective: I need to map the data in the errorCode Column to a new column named errorMean. Hope you can suggest a method I can achieve this. Fell free to give a solution which can even completely skip the process of writing down the Schema Text.


Solution

  • I found a similar question on stack overflow. This approach worked fine with just a couple of columns But I realised this method is not possible for csv's with a large number of Columns.

    To avoid providing a very large schema, you set the CSVReader's Schema Access Strategy to Infer Schema and CSVRecordSetWriter's Schema Access Strategy to Inherit Record Schema. So when the CSV is read, the schema will be inferred. The same schema will then be used to write the CSV.

    enter image description here

    The rest of the mapping works the same as described in the answer you linked.