Search code examples
rsumsubsequence

Why does this greatest subsequential sum code work?


Rosetta Code has the following task, described here:

"Given a sequence of integers, find a continuous subsequence which maximizes the sum of its elements, that is, the elements of no other single subsequence add up to a value larger than this one. An empty subsequence is considered to have the sum of 0; thus if all elements are negative, the result must be the empty sequence."

This isn't a very complicated problem, and can be comfortably solved in ten lines or less. However, their R solution confused me. I've reproduced it below:

max.subseq <- function(x) {
  cumulative <- cumsum(x)
  min.cumulative.so.far <- Reduce(min, cumulative, accumulate=TRUE)
  end <- which.max(cumulative-min.cumulative.so.far)
  begin <- which.min(c(0, cumulative[1:end]))
  if (end >= begin) x[begin:end] else x[c()]
}

In particular, I can't see why the min.cumulative.so.far variable is needed and the final line's idea of checking if the index of some maximum is greater than the index of a minimum is very strange to me indeed.

So why does this code work? I understand every individual function and its output, but I have no idea why putting them together like this would work, or why it would be picked over a simpler "generate a list of valid subsequences, and pick the one with the greatest sum" approach.


Solution

  • Let's break it down starting at the result of cumulative.

    1. min.cumulative.so.far will provide you with the minimum cumulative before the current index for all indexes.
    2. In the next step we calculate the cumulative of the best sub-sequence for every element of the sequence, from the min.cumulative.so.far to the value at each index. end will be the end of the best subsequence which can be 1.
    3. This is where it gets interesting: if all values are negative, end will be 1 and all values for cumulative[1:end] will be negative, so the result of which.min() will be 2 as the value is less than 0 (the value at index 1). This will now result in begin being larger than end and the function will return an empty vector. Otherwise begin will be set to the first index after the lowest value before the index end which, as we have already established is the last element of the largest sub-sequence based on cumulative at index begin. This is why 0 is added to the vector. If the relevant min.so.far is negative, start with the next value in the vector. If all the cumulative values are positive, simply start at the beginning (the added 0 will be the lowest value). The only thing that happens in this step is actually finding the index with the minimum cumulative up to index end.
    4. The return is then trivial. Empty if all values are negative as described above, otherwise x from the beginning to the end of the sub-sequence with the largest sum.

    Edge case: all values negative

    > x <-  -(1:10)
    > cumulative <- cumsum(x)
    > cumulative
     [1]  -1  -3  -6 -10 -15 -21 -28 -36 -45 -55
    > min.cumulative.so.far <- Reduce(min, cumulative, accumulate=TRUE)
    > min.cumulative.so.far
     [1]  -1  -3  -6 -10 -15 -21 -28 -36 -45 -55
    > end <- which.max(cumulative-min.cumulative.so.far)
    > end
    [1] 1
    > begin <- which.min(c(0, cumulative[1:end]))
    # c(0, cumulative[1:end]) is c(0, -1) in this case as 1:end = 1:1
    > begin
    [1] 2
    > if (end >= begin) x[begin:end] else x[c()]
    integer(0)
    

    Standard Case

    > x <- c(1, 5, -9, 3, 7, 1, 2, 4, 5, -6)
    > cumulative <- cumsum(x)
    > cumulative
     [1]  1  6 -3  0  7  8 10 14 19 13
    > min.cumulative.so.far <- Reduce(min, cumulative, accumulate=TRUE)
    > min.cumulative.so.far
     [1]  1  1 -3 -3 -3 -3 -3 -3 -3 -3
    > end <- which.max(cumulative-min.cumulative.so.far)
    > end
    [1] 9
    > begin <- which.min(c(0, cumulative[1:end]))
    > begin
    [1] 4
    > if (end >= begin) x[begin:end] else x[c()]
    [1] 3 7 1 2 4 5