I have two files which are very large in size say 50000 lines each. I need to compare these two files and identify the changes. However, the catch is if a line is present at different position, it should not be shown as different.
For eg, consider this
File A.txt
xxxxx
yyyyy
zzzzz
File B.txt
zzzzz
xxxx
yyyyy
So if this is the content of the file. My code should give the output as xxxx(or both xxxx and xxxxx).
Ofcourse the easiest way would be storing each line of the file in a
List< String>
and comparing with the other
List< String>.
But this seems to be taking a lot of time. I have also tried using the DiffUtils in java. But it doesnt recognize the lines present in diferent line numbers as same. So is there any other algorithm that might help me?
probably using Set
is the easiest way:
Set<String> set1 = new HashSet<String>(FileUtils.readLines(file1));
Set<String> set2 = new HashSet<String>(FileUtils.readLines(file2));
Set<String> similars = new HashSet<String>(set1);
similars.retainAll(set2);
set1.removeAll(similars); //now set1 contains distinct lines in file1
set2.removeAll(similars); //now set2 contains distinct lines in file2
System.out.println(set1); //prints distinct lines in file1;
System.out.println(set2); //prints distinct lines in file2