I have irregular (albeit consistent) "csv" files I need to parse. Content looks like this:
Field1: Field1Text
Field2: Field2Text
Field3 (need to ignore)
Field4 (need to ignore)
Field5
Field5Text
// Cars - for example
#,Col1,Col2,Col3,Col4,Col5,Col6
#1,Col1Text,Col2Text,Col3Text,Col4Text,Col5Text,Col6Text
#2,Col1Text,Col2Text,Col3Text,Col4Text,Col5Text,Col6Text
#3,Col1Text,Col2Text,Col3Text,Col4Text,Col5Text,Col6Text
Ideally I would like to use a similar approach as here.
I ultimately want to end up with an object like:
String field1;
String field2;
String field5;
List<Car> cars;
I currently have the following problems:
Your first issue is with the #
which by default is treated as a comment character. To prevent lines starting with #
to be treated as a comment, do this:
parserSettings.getFormat().setComment('\0');
As for the structure you are parsing, there's not a way to do it out of the box, but it's easy to leverage the API for it. The following will work:
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setComment('\0'); //prevent lines starting with # to be parsed as comments
//Creates a parser
CsvParser parser = new CsvParser(settings);
//Open the input
parser.beginParsing(new File("/path/to/input.csv"), "UTF-8");
//create BeanListProcessor for instances of Car, and initialize it.
BeanListProcessor<Car> carProcessor = new BeanListProcessor<Car>(Car.class);
carProcessor.processStarted(parser.getContext());
String[] row;
Parent parent = null;
while ((row = parser.parseNext()) != null) { //read rows one by one.
if (row[0].startsWith("Field1:")) { // when Field1 is found, create your parent instance
if (parent != null) { //if you already have a parent instance, cars have been read. Associate the list of cars to the instance
parent.cars = new ArrayList<Car>(carProcessor.getBeans()); //copy the list of cars from the processor.
carProcessor.getBeans().clear(); //clears the processor list
//you probably want to do something with your parent bean here.
}
parent = new Parent(); //create a fresh parent instance
parent.field1 = row[0]; //assign the fields as appropriate.
} else if (row[0].startsWith("Field2:")) {
parent.field2 = row[0]; //and so on
} else if (row[0].startsWith("Field5:")) {
parent.field5 = row[0];
} else if (row[0].startsWith("#")){ //got a "Car" row, invoke the rowProcessed method of the carProcessor.
carProcessor.rowProcessed(row, parser.getContext());
}
}
//at the end, if there is a parent, get the cars parsed
if (parent != null) {
parent.cars = carProcessor.getBeans();
}
For the BeanListProcessor
to work, you need to have your instance declared like this:
public static final class Car {
@Parsed(index = 0)
String id;
@Parsed(index = 1)
String col1;
@Parsed(index = 2)
String col2;
@Parsed(index = 3)
String col3;
@Parsed(index = 4)
String col4;
@Parsed(index = 5)
String col5;
@Parsed(index = 6)
String col6;
}
You can use headers instead but it will make you write more code. If the headers are always the same then you can just assume the positions are fixed.
Hope this helps