Search code examples
javaalgorithmcomparisonversioningrevision

What's the algorithm wikipedia uses for their version comparison feature


I am currently implementing some sort of text-version (revision) comparison visualizations and am trying to find some information about how wikipedia achieves their "View History"-feature in which they allow to compare the current revision with an older one.

You can find one example (About stackoverflow!) here:

http://en.wikipedia.org/w/index.php?title=Stack_Overflow&diff=512241244&oldid=458578615

I have implemented several ideas so far and also tried to reproduce the way wikipedia is doing it. For this I've implemented the Levenshtein-distance algorithm ( http://en.wikipedia.org/wiki/Levenshtein_distance ).

Lets assume I have two lists. I am iterating over the first list and check for the index-position of the first list on the second if the string found there is more than 50% equal. If it is, I'll just print both Strings side by side in my comparison view and continue with the next item of the first list. If it is not, I check the next item in the second list until I find it or leave the field for the second list blank if it cannot be found. (Although I would basically prefer that a sentence from the second list also always appears on the comparison view instead of leaving it out, just e.g. with a blank field for the first list field)

This method has some weaknesses. At first, if some sentence got deleted I would need to check the positions around the index for not simply "forgetting" it. But still I need to take care that text positions don't get inverted if I do so.

Has anyone of you tried to achieve something similar with java? If there are some code examples how others or you achieved it, I would gladly take a look to learn from it.

And of course, if you know anything about the algorithm wikipedia (and general wikis I assume?) uses for their revision comparison I'd be glad to hear it.


Solution

  • Wikipedia explains how the wiki difference engine works - http://en.wikipedia.org/wiki/Help:Diff

    You can follow the links at the bottom of the page to learn more, but this page lists the template used.