Search code examples
javacsvlarge-filesopencsv

Good and effective CSV/TSV Reader for Java


I am trying to read big CSV and TSV (tab-separated) Files with about 1000000 rows or more. Now I tried to read a TSV containing ~2500000 lines with opencsv, but it throws me an java.lang.NullPointerException. It works with smaller TSV Files with ~250000 lines. So I was wondering if there are any other Libraries that support the reading of huge CSV and TSV Files. Do you have any ideas?

Everybody who is interested in my Code (I shorten it, so Try-Catch is obviously invalid):

InputStreamReader in = null;
CSVReader reader = null;
try {
    in = this.replaceBackSlashes();
    reader = new CSVReader(in, this.seperator, '\"', this.offset);
    ret = reader.readAll();
} finally {
    try {
        reader.close();
    } 
}

Edit: This is the Method where I construct the InputStreamReader:

private InputStreamReader replaceBackSlashes() throws Exception {
        FileInputStream fis = null;
        Scanner in = null;
        try {
            fis = new FileInputStream(this.csvFile);
            in = new Scanner(fis, this.encoding);
            ByteArrayOutputStream out = new ByteArrayOutputStream();

            while (in.hasNext()) {
                String nextLine = in.nextLine().replace("\\", "/");
                // nextLine = nextLine.replaceAll(" ", "");
                nextLine = nextLine.replaceAll("'", "");
                out.write(nextLine.getBytes());
                out.write("\n".getBytes());
            }

            return new InputStreamReader(new ByteArrayInputStream(out.toByteArray()));
        } catch (Exception e) {
            in.close();
            fis.close();
            this.logger.error("Problem at replaceBackSlashes", e);
        }
        throw new Exception();
    }

Solution

  • I have not tried it, but I had investigated superCSV earlier.

    http://sourceforge.net/projects/supercsv/

    http://supercsv.sourceforge.net/

    Check if that works for you, 2.5 million lines.