Search code examples
pythonalgorithmbinary-searchtheorylis

longest increasing subsequence problem - n log n solution that returns the actual subsequence - explanation/clarification needed


I've tried to implement the n log n solution to the longest increasing subsequence problem (where you need to find a longest subsequence where each element is larger than the previous one of a given sequence), one which will find the actual subsequence that is longest (not just its length). I've worked off of this video - https://www.youtube.com/watch?v=S9oUiVYEq7E - but sadly I don't think the algorithm shown on the video is correct - it seems to at least work for the exact input that is shown on the video, but doesn't work for others, such as [1, 8, 6, 4, 9, 8, 3, 5, 2, 7, 1, 9, 5, 7].

from bisect import bisect_left, bisect_right
from math import floor, ceil


sequence = [3, 4, -1, 5, 8, 2, 3, 12, 7, 9, 10]
indexes = [0]
helper = [-1] * len(sequence)
for i in range(1, len(sequence)):
    if len(indexes) == 0 or sequence[i] > sequence[indexes[-1]]:
        indexes.append(i)
        helper[i] = indexes[-2]
    else:
        ceiltable = bisect_right([sequence[x] for x in indexes], sequence[i])
        indexes[ceiltable] = i
        if ceiltable > 0:
            helper[i] = indexes[ceiltable - 1]



solution = [sequence[x] for x in indexes]


print(f"longest increasing subsequence is {solution}, and has a lenght of  {len(solution)}")

And my question(s) are - can anyone confirm/disconfirm whether the algorithm shown in that video is actually correct and what might be wrong with my implementation of it? Also, can I ask anyone to provide a simple explanation/pseudocode/mockup of the n log n solution of this problem? I tried searching of course, But I don't think there is anything that really explains how this solution works, or specifically how an implementation of it would work - and again, just to note, I also have to return the actual subsequence, not just the length.


Solution

  • The video you refer to explains he algorithm correctly.

    There are two issues in your implementation:

    • You should use bisect_left instead of bisect_right, as otherwise you will allow solutions that are in fact non-decreasing sequences (with potential duplicate values) instead of strictly increasing sequences. And bisect_right may also result in an index that is equal to the length of the list, resulting in an invalid index access error. (Side note: If you really want to use bisect_right and find non-decreasing sequences, then make the preceding if condition >=)

    • The code does not translate the gathered data correctly into a solution. You really need to use the helper list to trace back the solution. Here is the code you could use:

      solution = []
      i = indexes[-1]  # start with the index of the last value of the longest sequence
      while i >= 0:
          solution.append(sequence[i])
          i = helper[i]  # Look up what the optimal predecessor is of that index
      solution.reverse()  # Reverse the solution since we populated it in reverse order
      

    Other remark

    The way you perform binary search is not efficient because you iterate the whole list with a list comprehension. That defeats the efficiency offered by binary search. You should have the values ready for binary search, so keep them in a separate list that you maintain throughout the algorithm, and don't do such a list comprehension any more at the time of the binary search.