Search code examples
javafilecomparison

What is the fastest way to compare two text files, not counting moved lines as different


I have two files which are very large in size say 50000 lines each. I need to compare these two files and identify the changes. However, the catch is if a line is present at different position, it should not be shown as different.

For eg, consider this
File A.txt

xxxxx
yyyyy
zzzzz    

File B.txt

zzzzz
xxxx
yyyyy  

So if this is the content of the file. My code should give the output as xxxx(or both xxxx and xxxxx).

Ofcourse the easiest way would be storing each line of the file in a

List< String>

and comparing with the other

List< String>.

But this seems to be taking a lot of time. I have also tried using the DiffUtils in java. But it doesnt recognize the lines present in diferent line numbers as same. So is there any other algorithm that might help me?


Solution

  • probably using Set is the easiest way:

    Set<String> set1 = new HashSet<String>(FileUtils.readLines(file1));
    
    Set<String> set2 = new HashSet<String>(FileUtils.readLines(file2));
    
    
    Set<String> similars = new HashSet<String>(set1);
    
    similars.retainAll(set2);
    
    set1.removeAll(similars); //now set1 contains distinct lines in file1
    set2.removeAll(similars); //now set2 contains distinct lines in file2
    System.out.println(set1); //prints distinct lines in file1;
    System.out.println(set2); //prints distinct lines in file2