Search code examples
algorithmlanguage-agnostic

Applications of Longest Increasing Subsquence


How useful is the LIS (Longest Increasing Subsequence) problem in tackling other CS problems? There are a few algorithms, using patience sorting, dynamic programming or with decision trees. How are these used in real life -- maybe to data streams or something?

To remind you, I put in bold the longest increasing sequence

{0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15}.

As a bonus, is there any way to use the result that a sequence of length mn + 1 will have an increasing subsequence of length m or a decreasing subsequence of length n? E.g. Our list as length 16, so there should be an increasing sequence of length 5 or decreasing sequence of length 5. In our case 0,2,6,9,11,15.

Also an increasing sequence of length 8 or a decreasing sequence of length 3: in our case 12,10,1.


Solution

  • An interesting real-world application of LIS is Patience Diff, a diffing algorithm by Bram Cohen (the creator of BitTorrent) which is used in the Bazaar version control system.

    The regular diff algorithm involves computing the LCS (Longest Common Subsequence) between two documents. While being efficient, this approach has a problem, which is -- the results often happen to be not quite human-friendly.

    A simple example of how a regular diff may fail:

     void func1() {
         x += 1
    +}
    +
    +void functhreehalves() {
    +    x += 1.5
     }
    
     void func2() {
         x += 2
     }
    

    The advantage of the Patience Diff algorithm is that it allows to compute the differences more accurately, in a manner more closely corresponding to how a human would perform.

    In the previous case Patience Diff spots the difference better:

     void func1() {
         x += 1
     }
    
    +void functhreehalves() {
    +    x += 1.5
    +}
    +
     void func2() {
         x += 2
     }
    

    In a nutshell, the algorithm is:

    1. Find unique lines which are common to both documents.
    2. Take all such lines from the first document and order them according to their appearance in the second document.
    3. Compute the LIS of the resulting sequence (by doing a Patience Sort), getting the longest matching sequence of lines, a correspondence between the lines of two documents.
    4. Recurse the algorithm on each range of lines between already matched ones.

    Take a look at the code.