I have a Java process that finds the diff between 2 CSVs. It returns the rows that were added/changed/deleted.
The primary part of the code is the following:
ProcessBuilder pb = new ProcessBuilder("/usr/bin/diff", file1.toString(), file2.toString());
Process process;
try
{
process = pb.start();
}
...
The problem is, the diff logic will not be accurate if the files are not sorted beforehand. To illustrate, say I have the following 2 datasets with the following:
DATA 1 DATA2
"10000,x,x" "10000,y,y"
"10000,y,y" "10000,x,x"
The lists are the same, but they are in different orders. As a consequence, my current logic will think that the row with ID 10000
was changed. The correct way to apply the diff would be on the sorted data like so...
DATA 1 DATA2
"10000,x,x" "10000,x,x"
"10000,y,y" "10000,y,y"
My question is, what is a working implementation in Java that is equivalent to the following...
diff -> sort(file1) sort(file2)
Seems that there's a relatively straightforward bash
solution...
String cmd = "diff <(sort " + file1.toString() + ") <(sort " + file2.toString() + ")";
ProcessBuilder pb = new ProcessBuilder("/bin/bash", "-c", cmd);
More clearly, it is the equivalent...
bash -c 'diff <(sort text2) <(sort text1)'