Given an input sequence, what is the best way to find the longest (not necessarily continuous) increasing subsequence
[0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15] # input
[1, 9, 13, 15] # an example of an increasing subsequence (not the longest)
[0, 2, 6, 9, 13, 15] # longest increasing subsequence (not a unique answer)
[0, 2, 6, 9, 11, 15] # another possible solution
I'm looking for the best algorithm. If there is code, Python would be nice, but anything is alright.
I just stumbled in this problem, and came up with this Python 3 implementation:
def subsequence(seq):
if not seq:
return seq
M = [None] * len(seq) # offset by 1 (j -> j-1)
P = [None] * len(seq)
# Since we have at least one element in our list, we can start by
# knowing that the there's at least an increasing subsequence of length one:
# the first element.
L = 1
M[0] = 0
# Looping over the sequence starting from the second element
for i in range(1, len(seq)):
# Binary search: we want the largest j <= L
# such that seq[M[j]] < seq[i] (default j = 0),
# hence we want the lower bound at the end of the search process.
lower = 0
upper = L
# Since the binary search will not look at the upper bound value,
# we'll have to check that manually
if seq[M[upper-1]] < seq[i]:
j = upper
else:
# actual binary search loop
while upper - lower > 1:
mid = (upper + lower) // 2
if seq[M[mid-1]] < seq[i]:
lower = mid
else:
upper = mid
j = lower # this will also set the default value to 0
P[i] = M[j-1]
if j == L or seq[i] < seq[M[j]]:
M[j] = i
L = max(L, j+1)
# Building the result: [seq[M[L-1]], seq[P[M[L-1]]], seq[P[P[M[L-1]]]], ...]
result = []
pos = M[L-1]
for _ in range(L):
result.append(seq[pos])
pos = P[pos]
return result[::-1] # reversing
Since it took me some time to understand how the algorithm works I was a little verbose with comments, and I'll also add a quick explanation:
seq
is the input sequence.L
is a number: it gets updated while looping over the sequence and it marks the length of longest incresing subsequence found up to that moment.M
is a list. M[j-1]
will point to an index of seq
that holds the smallest value that could be used (at the end) to build an increasing subsequence of length j
.P
is a list. P[i]
will point to M[j]
, where i
is the index of seq
. In a few words, it tells which is the previous element of the subsequence. P
is used to build the result at the end.How the algorithm works:
i
.j
that let seq[M[j]
be <
than seq[i]
.P
, M
and L
.Note: The only differences with the wikipedia algorithm are the offset of 1 in the M
list, and that X
is here called seq
. I also test it with a slightly improved unit test version of the one showed in Eric Gustavson answer and it passed all tests.
Example:
seq = [30, 10, 20, 50, 40, 80, 60]
0 1 2 3 4 5 6 <-- indexes
At the end we'll have:
M = [1, 2, 4, 6, None, None, None]
P = [None, None, 1, 2, 2, 4, 4]
result = [10, 20, 40, 60]
As you'll see P
is pretty straightforward. We have to look at it from the end, so it tells that before 60
there's 40,
before 80
there's 40
, before 40
there's 20
, before 50
there's 20
and before 20
there's 10
, stop.
The complicated part is on M
. At the beginning M
was [0, None, None, ...]
since the last element of the subsequence of length 1 (hence position 0 in M
) was at the index 0: 30
.
At this point we'll start looping on seq
and look at 10
, since 10
is <
than 30
, M
will be updated:
if j == L or seq[i] < seq[M[j]]:
M[j] = i
So now M
looks like: [1, None, None, ...]
. This is a good thing, because 10
have more chanches to create a longer increasing subsequence. (The new 1 is the index of 10)
Now it's the turn of 20
. With 10
and 20
we have subsequence of length 2 (index 1 in M
), so M
will be: [1, 2, None, ...]
. (The new 2 is the index of 20)
Now it's the turn of 50
. 50
will not be part of any subsequence so nothing changes.
Now it's the turn of 40
. With 10
, 20
and 40
we have a sub of length 3 (index 2 in M
, so M
will be: [1, 2, 4, None, ...]
. (The new 4 is the index of 40)
And so on...
For a complete walk through the code you can copy and paste it here :)