Haskell bottom-up DP space leak

Apologies if this is too specific, I am new here and not exactly sure what is reasonable. I have been bashing my head against this problem for hours with nothing to show for it. The following code is my implementation of a competitive programming problem.

module Main where

import Data.List (foldl', groupBy)
import Debug.Trace

type Case = (Int, [(Int, Int)])
type Soln = Int

main = interact handle

handle :: String -> String
handle = fmt . solve . parse

fmt :: Soln -> String
fmt s = (show s) ++ "\n"

parse :: String -> Case
parse s = (l, fs)
  where
    (l:_:fs') = (map read) $ words s
    fs = pairs fs'

pairs :: [a] -> [(a, a)]
pairs [] = []
pairs (a:b:s) = (a, b):(pairs s)

solve :: Case -> Soln
solve c@(l, fs) = last $ foldl' run [0..l] f
  where
    f = concat $ map rep $ map combine $ groupBy samev fs
    samev a b = (snd a) == (snd b)
    combine a = (sum $ map fst $ a, snd $ head $ a)
    rep (n, v) = replicate (min n (l `div` v)) v

run :: [Int] -> Int -> [Int]
run b v = (take v b) ++ (zipWith min b (drop v b))
-- run b v = (take v b) ++ (zipMin b (drop v b))

zipMin :: [Int] -> [Int] -> [Int]
zipMin [] _ = []
zipMin _ [] = []
zipMin (a:as) (b:bs) = (min a b):(zipMin as bs)

The intent is that this works like a bottom-up dynamic programming solution generating each row of the DP table from the previous using the fold in solve. In theory GHC should be able to optimize out all the old rows of the table. However, running this program on a moderately large input with approximately l = 2000 and length f = 5000, I get this:

> time ./E < E.in 
0
1.98user 0.12system 0:02.10elapsed 99%CPU (0avgtext+0avgdata 878488maxresident)k
0inputs+0outputs (0major+219024minor)pagefaults 0swaps

That's using 878 MB of memory at peak! The table I am generating is only 10,000 Ints, and in theory I only need one row at a time! It seems obvious that this is some form of thunk leak or other space leak. Profiling reveals that run is consuming 99% of total runtime and memory. Digging further indicated that 98% of that was in the zipWith call. Interestingly, replacing the call to zipWith min with my custom zipMin function produces a significant improvement:

> time ./E < E.in 
0
1.39user 0.08system 0:01.48elapsed 99%CPU (0avgtext+0avgdata 531400maxresident)k
0inputs+0outputs (0major+132239minor)pagefaults 0swaps

That's just 70% the run time, and 60% the memory! I tried all sorts to make this work. I know (++) is generally a bad idea, so I replaced the lists in run with Data.Sequence.Seq Int... and it got slower and used more memory. I am not particularly experienced with Haskell, but I am at my wit's end here. I am sure the answer to this problem exists somehwere on SO, but I am too new to Haskell to be able to find it, it seems.

Any help any of you can offer is very much appreciated. I would love an explanation of exactly what I have done wrong, how to diagnose it in future, and how to fix it.

Thanks in advance.

EDIT:

After following Steven's Excellent advice and replacing my lists with unboxed vectors the performance is... uh... signficantly improved:

> time ./E < E.in 
0
0.01user 0.00system 0:00.02elapsed 80%CPU (0avgtext+0avgdata 5000maxresident)k
24inputs+0outputs (0major+512minor)pagefaults 0swaps

Solution

So, by using foldl' you have ensured that the accumulator will be in WHNF. Putting a list in WHNF only evaluates the first element of the list. The remainder of the list exists as a thunk, and will be passed around as a thunk to the subsequent calls of your fold. Since you are interested in multiple positions in the list at once (that is, you are dropping some parts of it in the zipWith) large portions of the lists are being kept from previous iterations.

The structure you need here is an unboxed vector. An unboxed vector will ensure that everything is maximally strict, and will run in far less memory.