I am trying to read big CSV
and TSV
(tab-separated) Files with about 1000000
rows or more. Now I tried to read a TSV
containing ~2500000
lines with opencsv
, but it throws me an java.lang.NullPointerException
. It works with smaller TSV
Files with ~250000
lines. So I was wondering if there are any other Libraries
that support the reading of huge CSV
and TSV
Files. Do you have any ideas?
Everybody who is interested in my Code (I shorten it, so Try-Catch
is obviously invalid):
InputStreamReader in = null;
CSVReader reader = null;
try {
in = this.replaceBackSlashes();
reader = new CSVReader(in, this.seperator, '\"', this.offset);
ret = reader.readAll();
} finally {
try {
reader.close();
}
}
Edit: This is the Method where I construct the InputStreamReader
:
private InputStreamReader replaceBackSlashes() throws Exception {
FileInputStream fis = null;
Scanner in = null;
try {
fis = new FileInputStream(this.csvFile);
in = new Scanner(fis, this.encoding);
ByteArrayOutputStream out = new ByteArrayOutputStream();
while (in.hasNext()) {
String nextLine = in.nextLine().replace("\\", "/");
// nextLine = nextLine.replaceAll(" ", "");
nextLine = nextLine.replaceAll("'", "");
out.write(nextLine.getBytes());
out.write("\n".getBytes());
}
return new InputStreamReader(new ByteArrayInputStream(out.toByteArray()));
} catch (Exception e) {
in.close();
fis.close();
this.logger.error("Problem at replaceBackSlashes", e);
}
throw new Exception();
}
I have not tried it, but I had investigated superCSV earlier.
http://sourceforge.net/projects/supercsv/
http://supercsv.sourceforge.net/
Check if that works for you, 2.5 million lines.