Search code examples
algorithmlcslis

Node structure in longest increasing subsequence algorithm (Jacobson & Vo)


I have a problem in understanding the node structure for the calculation of the complete longest increasing subsequence (lis) in the paper "Heaviest Increasing/Common Subsequence Problems" by Jacobson and Vo.

Here is the pseudo code from the paper:

Algorithm[1]

What is meant with

node is an auxiliary array that, for each element in L, contains a record of an element that precedes this element in an increasing subsequence. The function newnode() constructs such records and links them into a directed graph. At the end of the algorithm, we can search from the maximal element of L to recover an LIS of sigma.

? How would you implement this structure?

Do I have to construct a directed graph with all the elements of the sequence as vertices (plus a nil-vertex) and edges "\sigma_i -> s" and then search for the longest path starting at the maximal element of L (and ending at nil)? Isn't there a more effective way to get the complete lis?


My second question: Is this algorithm as fast as the algorithm described in wikipedia? If not: Can I modify the algorithm from wikipedia as well to calculate heaviest common subsequences as described in the paper?


Solution

  • I'd implement the array with an array and the graph as a singly-linked list with structure sharing. To wit, in generic C++,

    #include <algorithm>
    #include <iostream>
    #include <memory>
    #include <utility>
    #include <vector>
    
    template <typename T>
    struct Node {
      explicit Node(const T& e) : element(e) {}
    
      T element;
      std::size_t index = 0;
      Node* previous = nullptr;
    };
    
    template <typename T, typename Compare>
    std::vector<T> LongestIncreasingSubsequence(const std::vector<T>& elements,
                                                Compare compare) {
      if (elements.empty()) {
        return {};
      }
      std::vector<std::unique_ptr<Node<T>>> node_ownership;
      node_ownership.reserve(elements.size());
      std::vector<Node<T>*> tableau;
      for (const T& element : elements) {
        auto node = std::make_unique<Node<T>>(element);
        auto it = std::lower_bound(tableau.begin(), tableau.end(), node.get(),
                                   [&](const Node<T>* a, const Node<T>* b) {
                                     return compare(a->element, b->element);
                                   });
        if (it != tableau.begin()) {
          auto previous = it[-1];
          node->index = previous->index + 1;
          node->previous = previous;
        }
        if (it != tableau.end()) {
          *it = node.get();
        } else {
          tableau.push_back(node.get());
        }
        node_ownership.push_back(std::move(node));
      }
      Node<T>* longest = *std::max_element(
          tableau.begin(), tableau.end(),
          [](Node<T>* a, Node<T>* b) { return a->index < b->index; });
      std::vector<T> result(longest->index + 1);
      for (; longest != nullptr; longest = longest->previous) {
        result.at(longest->index) = longest->element;
      }
      return result;
    }
    
    int main() {
      for (int x : LongestIncreasingSubsequence(
               std::vector<int>{3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3},
               std::less<int>())) {
        std::cout << x << '\n';
      }
    }
    

    If you're fortunate enough to be working in a language with garbage collection, you can ignore the business with node_ownership and std::move.

    Here's a Python version.

    import bisect
    
    
    def longest_increasing_subsequence(elements):
        elements = list(elements)
        if not elements:
            return []
        # Build the tableau
        tableau_elements = []
        tableau_indexes = []
        predecessors = []
        for i, element in enumerate(elements):
            j = bisect.bisect_left(tableau_elements, element)
            predecessors.append(tableau_indexes[j - 1] if j > 0 else None)
            if j < len(tableau_elements):
                tableau_elements[j] = element
                tableau_indexes[j] = i
            else:
                tableau_elements.append(element)
                tableau_indexes.append(i)
        # Find the subsequence lengths
        lengths = []
        for i, predecessor in enumerate(predecessors):
            lengths.append(1 if predecessor is None else lengths[predecessor] + 1)
        # Extract the best subsequence
        best = max(range(len(lengths)), key=lambda i: lengths[i])
        subsequence = []
        while best is not None:
            subsequence.append(elements[best])
            best = predecessors[best]
        subsequence.reverse()
        return subsequence
    
    
    print(longest_increasing_subsequence([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3]))