Rosetta Code has the following task, described here:
"Given a sequence of integers, find a continuous subsequence which maximizes the sum of its elements, that is, the elements of no other single subsequence add up to a value larger than this one. An empty subsequence is considered to have the sum of 0; thus if all elements are negative, the result must be the empty sequence."
This isn't a very complicated problem, and can be comfortably solved in ten lines or less. However, their R solution confused me. I've reproduced it below:
max.subseq <- function(x) {
cumulative <- cumsum(x)
min.cumulative.so.far <- Reduce(min, cumulative, accumulate=TRUE)
end <- which.max(cumulative-min.cumulative.so.far)
begin <- which.min(c(0, cumulative[1:end]))
if (end >= begin) x[begin:end] else x[c()]
}
In particular, I can't see why the min.cumulative.so.far
variable is needed and the final line's idea of checking if the index of some maximum is greater than the index of a minimum is very strange to me indeed.
So why does this code work? I understand every individual function and its output, but I have no idea why putting them together like this would work, or why it would be picked over a simpler "generate a list of valid subsequences, and pick the one with the greatest sum" approach.
Let's break it down starting at the result of cumulative
.
min.cumulative.so.far
will provide you with the minimum cumulative
before the current index for all indexes.cumulative
of the best sub-sequence for every element of the sequence, from the min.cumulative.so.far
to the value at each index. end
will be the end of the best subsequence which can be 1.end
will be 1 and all values for cumulative[1:end]
will be negative, so the result of which.min()
will be 2 as the value is less than 0 (the value at index 1). This will now result in begin
being larger than end
and the function will return an empty vector. Otherwise begin
will be set to the first index after the lowest value before the index end
which, as we have already established is the last element of the largest sub-sequence based on cumulative
at index begin
. This is why 0
is added to the vector. If the relevant min.so.far
is negative, start with the next value in the vector. If all the cumulative
values are positive, simply start at the beginning (the added 0
will be the lowest value). The only thing that happens in this step is actually finding the index with the minimum cumulative
up to index end
. x
from the beginning to the end of the sub-sequence with the largest sum. Edge case: all values negative
> x <- -(1:10)
> cumulative <- cumsum(x)
> cumulative
[1] -1 -3 -6 -10 -15 -21 -28 -36 -45 -55
> min.cumulative.so.far <- Reduce(min, cumulative, accumulate=TRUE)
> min.cumulative.so.far
[1] -1 -3 -6 -10 -15 -21 -28 -36 -45 -55
> end <- which.max(cumulative-min.cumulative.so.far)
> end
[1] 1
> begin <- which.min(c(0, cumulative[1:end]))
# c(0, cumulative[1:end]) is c(0, -1) in this case as 1:end = 1:1
> begin
[1] 2
> if (end >= begin) x[begin:end] else x[c()]
integer(0)
Standard Case
> x <- c(1, 5, -9, 3, 7, 1, 2, 4, 5, -6)
> cumulative <- cumsum(x)
> cumulative
[1] 1 6 -3 0 7 8 10 14 19 13
> min.cumulative.so.far <- Reduce(min, cumulative, accumulate=TRUE)
> min.cumulative.so.far
[1] 1 1 -3 -3 -3 -3 -3 -3 -3 -3
> end <- which.max(cumulative-min.cumulative.so.far)
> end
[1] 9
> begin <- which.min(c(0, cumulative[1:end]))
> begin
[1] 4
> if (end >= begin) x[begin:end] else x[c()]
[1] 3 7 1 2 4 5