I have a Java process that finds the diff between 2 CSVs. It returns the rows that were added/changed/deleted.
The primary part of the code is the following:
ProcessBuilder pb = new ProcessBuilder("/usr/bin/diff", file1.toString(), file2.toString());
Process process;
process = pb.start();
The problem is, the diff logic will not be accurate if the files are not sorted beforehand. To illustrate, say I have the following 2 datasets with the following:
"10000,x,x" "10000,y,y"
"10000,y,y" "10000,x,x"
The lists are the same, but they are in different orders. As a consequence, my current logic will think that the row with ID 10000
was changed. The correct way to apply the diff would be on the sorted data like so...
"10000,x,x" "10000,x,x"
"10000,y,y" "10000,y,y"
My question is, what is a working implementation in Java that is equivalent to the following...
diff -> sort(file1) sort(file2)
Seems that there's a relatively straightforward bash
String cmd = "diff <(sort " + file1.toString() + ") <(sort " + file2.toString() + ")";
ProcessBuilder pb = new ProcessBuilder("/bin/bash", "-c", cmd);
More clearly, it is the equivalent...
bash -c 'diff <(sort text2) <(sort text1)'