Search code examples
javacsvdouble-quotesapache-commons-csv

CSV parsing with Commons CSV - Quotes within quotes causing IOException


I am using Commons CSV to parse CSV content relating to TV shows. One of the shows has a show name which includes double quotes;

116,6,2,29 Sep 10,""JJ" (60 min)","http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj"

The showname is "JJ" (60 min) which is already in double quotes. This is throwing an IOException java.io.IOException: (line 1) invalid char between encapsulated token and delimiter.

    ArrayList<String> allElements = new ArrayList<String>();
    CSVFormat csvFormat = CSVFormat.DEFAULT;
    CSVParser csvFileParser = new CSVParser(new StringReader(line), csvFormat);

    List<CSVRecord> csvRecords = null;

    csvRecords = csvFileParser.getRecords();

    for (CSVRecord record : csvRecords) {
        int length = record.size();
        for (int x = 0; x < length; x++) {
            allElements.add(record.get(x));
        }
    }

    csvFileParser.close();
    return allElements;

CSVFormat.DEFAULT already sets withQuote('"')

I think that this CSV is not properly formatted as ""JJ" (60 min)" should be """JJ"" (60 min)" - but is there a way to get commons CSV to handle this or do I need to fix this entry manually?

Additional information: Other show names contain spaces and commas within the CSV entry and are placed within double quotes.


Solution

  • The problem here is that the quotes are not properly escaped. Your parser doesn't handle that. Try univocity-parsers as this is the only parser for java I know that can handle unescaped quotes inside a quoted value. It is also 4 times faster than Commons CSV. Try this code:

    //configure the parser to handle your situation
    CsvParserSettings settings = new CsvParserSettings();
    settings.setUnescapedQuoteHandling(STOP_AT_CLOSING_QUOTE);
    
    //create the parser
    CsvParser parser = new CsvParser(settings);
    
    //parse your line
    String[] out = parser.parseLine("116,6,2,29 Sep 10,\"\"JJ\" (60 min)\",\"http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj\"");
    
    for(String e : out){
        System.out.println(e);
    }
    

    This will print:

    116
    6
    2
    29 Sep 10
    "JJ" (60 min)
    http://www.tvmaze.com/episodes/4855/criminal-minds-6x02-jj
    

    Hope it helps.

    Disclosure: I'm the author of this library, it's open source and free (Apache 2.0 license)