Search code examples
csvparsingtextunivocity

Break CSV into multiple parts with Univocity


I have a CSV with multiple datasets in it. For ex,

FIELD1, 10, FIELD2, 20, FIELD3, 30 // dataset1 begins here
FIELD4, 40, FIELD5, 50 // line 2 of dataset1
SUBFIELD1_ROW1, 100, SUBFIELD2_ROW1, 200 // subsection of dataset1: line1
SUBFIELD1_ROW2, 300, SUBFIELD2_ROW2, 400 // subsection of dataset1: line2
SUBFIELD1_ROW3, 500, SUBFIELD2_ROW3, 600 // subsection of dataset1: line3
FIELD1, 10, FIELD2, 20, FIELD3, 30 // dataset2 begins here
FIELD4, 40, FIELD5, 50 // line 2 of dataset2
SUBFIELD1_ROW1, 100, SUBFIELD2_ROW1, 200 // subsection of dataset2: line1
SUBFIELD1_ROW2, 300, SUBFIELD2_ROW2, 400 // subsection of dataset2: line2
SUBFIELD1_ROW3, 500, SUBFIELD2_ROW3, 600 // subsection of dataset2: line3
// dataset 3
// dataset 4 and so on

Is it possible to break this CSV into 4 parts (one for each dataset)? I looked through the test classes on Univocity GitHub page but couldn't find a similar example.


Solution

  • Check this example. Basically you need to use an InputValueSwitch that targets the first column. Add a switch for "FIELD1", another for "Field4" and another one for "SUBFIELD". You need to associate a different processor to each possible row type using:

    inputSwitch.addSwitchForValue(<your column matcher>, processorForRowWhereMatcherReturnsTrue);
    

    What happens when the format of the row changes is for you to decide. You can override

    public void rowProcessorSwitched(RowProcessor from, RowProcessor to) 
    

    Of the InputValueSwitch to do whatever you need.

    Check these other related questions:

    Univocity - parse each TSV file row to different Type of class object

    Univocity CSV parser multiple beans with multiple rows in single CSV

    Modifying complex csv files in java