Search code examples
javacsvparsinginputstreammemory-efficient

How can I parse a csv in low memory, using some parser in Java?


I used InputStream, and on parsing, if there is a "," in one column then it considers it as a separate column. ex - abc, xyz, "m,n" then the parsed output is abc , xyz, m, n Here m and n are considered as separate columns.


Solution

  • I really like the Apache Commons CSVParser. This is almost verbatim from their user guide:

    Reader reader = new FileReader("input.csv");
    final CSVParser parser = new CSVParser(reader, CSVFormat.DEFAULT);
    try {
        for (final CSVRecord record : parser) {
            final String string = record.get("SomeColumn");
            ...
        }
    } finally {
        parser.close();
        reader.close();
    }
    

    This is simple, configurable and line-oriented.

    You could configure it like this:

    final CSVParser parser = new CSVParser(reader, CSVFormat.DEFAULT.withHeader().withDelimiter(';'));
    

    For the record, this configuration is unnecessary, as the CSVFormat.DEFAULT works exactly the way you want it to.

    This would be my first attempt to see whether it fits into the memory. If it doesn't, can you be a little more specific about low memory footprint?