I have to compare two csv files with a size of 2-3 GB each, contained in Windows platform.
I've tried to put the first one in a HashMap to compare it with the second one, but the result (as expected) is a very high memory cosumption.
The target is to get the differences in another file.
The lines may appear in diffent order, and maybe missed also.
Any suggetions?
Assuming you wish to do this in Java, via programming, the answers are different.
Are both of the files ordered? If so, then you don't need to read in whole files, you simply start at the beginning of both files, and
If you don't have ordered files, then perhaps you could order the files prior to the diff. Again, since you need a low memory solution, don't read the entire file in to sort it. Chop the file up into manageable chunks, and then sort each chunk. Then use insertion sort to combine the chunks.