I have a program where i generate a huge matrix and once it is calculated, i have to reuse it at later times. For that reason, i want to cache it to the local hard disk so that i can read it at later times. I am using it simply by writing data to file and then later reading it.
But is there anything special that i should take into consideration for doing such tasks in java. For example, do i need to serialize it or may be do something special. Is there something i should take care for doing such things where i store important application usage data. Should it be plain ASCII/xml or what?
The data is not sensitive, however the integrity of the data is important.
If your data is really huge, I'd recommend some binary form - this will make it smaller and faster to read and especially parse (XML or JSON are many times slower than reading/writing binary data). Serialization also brings a lot of overhead, so you might want to check DataInputStream and DataOutputStream. If you know you will be writing only numbers of specific type or you know what sequence the data will be in - these are certainly the fastest ones.
Do not forget to wrap File Streams with Buffered Streams - they will make your operations order of magnitude faster still.
Something like (8192 is example buffer size- you can tailor it to your needs):
final File file = null; // get file somehow
final DataOutputStream dos = new DataOutputStream(
new BufferedOutputStream(new FileOutputStream(file), 8192));
try {
for (int x: ....) { //loop through your matrix (might be different if matrix is sparse)
for (int y: ....) {
if (matrix[x,y] != 0.0) {
dos.writeInt(x);
dos.writeInt(y);
dos.writeDouble(matrix[x,y]);
}
}
}
} finally {
dos.writeInt(-1); // mark end (might be done differently)
dos.close();
}
and input:
final File file = null; // get file somehow
final DataInputStream dis = new DataInputStream(
new BufferedInputStream(new FileInputStream(file), 8192));
try {
int x;
while((x = dis.readInt()) != -1) {
int y = dis.readInt();
double value = dis.readDouble();
// store x,y, value in matrix
}
} finally {
dis.close();
}
as correctly pointed out by Ryan Amos, in case matrix is not sparse, it could be faster to just write values (but all of them):
Out:
dos.write(xSize);
dos.write(ySize);
for (int x=0; x<xSize; x++) {
for (int y=0; y<ySize; y++) {
value = matrix[x,y];
dos.write(value);
}
}
In:
int xSize = dis.readInt();
int ySize = dis.readInt();
for (int x=0; x<xSize; x++) {
for (int y=0; y<ySize; y++) {
double value = dis.readDouble();
matrix[x,y] = value;
}
}
(mind I have not compiled it - so you might need to correct some stuff - it is out of the top of my head).
Without buffers, you will read byte by byte which will make it slow.
One more comment - with such a huge dataset, you should consider using SparseMatrix and write/read only the elements which are non-zero (unless you really have that many of significant elements).
As wrote in the comment above - if you really want to write/read every single element in the matrix of that size, then you are already talking about hours of write rather than seconds.