Search code examples
javaprocesscommand-line-interfaceprocessbuilder

Java Process Builder - How to Sort 2 files before Diff


I have a Java process that finds the diff between 2 CSVs. It returns the rows that were added/changed/deleted.

The primary part of the code is the following:

ProcessBuilder pb = new ProcessBuilder("/usr/bin/diff", file1.toString(), file2.toString());
Process process;
        
try
{
    process = pb.start();
}
...

The problem is, the diff logic will not be accurate if the files are not sorted beforehand. To illustrate, say I have the following 2 datasets with the following:

DATA 1               DATA2
"10000,x,x"          "10000,y,y"
"10000,y,y"          "10000,x,x"

The lists are the same, but they are in different orders. As a consequence, my current logic will think that the row with ID 10000 was changed. The correct way to apply the diff would be on the sorted data like so...

DATA 1               DATA2
"10000,x,x"          "10000,x,x"
"10000,y,y"          "10000,y,y"

My question is, what is a working implementation in Java that is equivalent to the following...

diff -> sort(file1) sort(file2)


Solution

  • Seems that there's a relatively straightforward bash solution...

    String cmd = "diff <(sort " + file1.toString() + ") <(sort " + file2.toString() + ")";
    ProcessBuilder pb = new ProcessBuilder("/bin/bash", "-c", cmd);
    

    More clearly, it is the equivalent...

    bash -c 'diff <(sort text2) <(sort text1)'