Search code examples
algorithmocamlproof

Using induction to prove linear maximum subarray algorithm


Here's my implementation of Kadane's algorihm that I wrote OCaml:

let rec helper max_now max_so_far f n index = 
  if n < index then max_so_far
  else if max_now + f index < 0 
  then helper 0 max_so_far f n (index+1)
  else helper (max_now + (f index)) 
      (max max_so_far (max_now + (f index))) f n (index+1)

let max_sum f n = helper 0 0 f n 1

Now I want to prove formally that it is correct. My code's specification:

The function f for arguments 1 to n returns an integer. The function max_sum with parametres f n should return the biggest sum of sums:

sum

Where 1<=a<=b<=n.

You don't need to analyse my code - I'm just trying to prove that Kadane's algorithm works using induction. The problem is I don't know how to approach this problem - proving simple procedures like factoring or factorials is quite straightforward but when it comes to something more abstract like finding the subarray with maximum sum I don't even know where to start. Any hints?


Solution

  • The basis of the induction won't go through, since the algorithm and the specification disagree on whether an empty subarray (having sum 0) is allowed. I'm going to follow the specification, which disallows empty subarrays.

    The inductive step is as follows. For all k, define

    max_now  = max_{a in 1..k  } sum_{i in a..k  } f(i)
    max_now' = max_{a in 1..k+1} sum_{i in a..k+1} f(i)
    max_so_far  = max_{b in 1..k  } max_{a in 1..b} sum_{i in a..b} f(i)
    max_so_far' = max_{b in 1..k+1} max_{a in 1..b} sum_{i in a..b} f(i).
    

    We need to show that

    max_now' = max(max_now, 0) + f(k+1)
    max_so_far' = max(max_so_far, max_now').
    

    Both equalities are proved by splitting a max.

    max_now' =      max_{a in 1..k+1} sum_{i in a  ..k+1} f(i)
             = max( max_{a in 1..k  } sum_{i in a  ..k+1} f(i),
                                      sum_{i in k+1..k+1} f(i))
             = max((max_{a in 1..k  } sum_{i in a  ..k  } f(i)) + f(k+1),
                                                                  f(k+1))
             = max(max_now, 0) + f(k+1)
    
    max_so_far' =     max_{b in 1..k+1} max_{a in 1..b  } sum_{i in a..b  } f(i)
                = max(max_{b in 1..k  } max_{a in 1..b  } sum_{i in a..b  } f(i),
                                        max_{a in 1..k+1} sum_{i in a..k+1} f(i))
                = max(max_so_far, max_now')