Search code examples
haskellstack-overflow

Why I take a message for stack overflow in Haskell?


 import Data.Vector hiding((++))
 import System.Environment 
 d = generate 1000000 (\z->case z of
                      0 -> 2
                      1 -> 3
                      2 -> 5
                      otherwise -> if odd z then (d ! (z-1)) +2 else (d ! (z-1)) + 4)
 algorithmA _ _  1 pt  = pt
 algorithmA t k n pt = let dk = d ! k
                       q = div n dk
                       r = mod n dk
                      in if r /=0 then
                            if q>dk then
                              algorithmA t (k+1) n pt
                            else (n:pt)
                         else
                             algorithmA (t+1) k q (dk:pt)

main = do
        args<-getArgs
        let n = read (args !! 0)
        if (floor(sqrt(fromInteger n))) > Data.Vector.last d  then error ("The square  root of number is greater than " ++ show (Data.Vector.last d))
     else
         print (algorithmA 0 0 n [])

When I compile the above program and give for example in the command line test1 2222 I take the message "Stake space overflow: current size ... use +RTS -Ksize -RTS to increase ... ". But when I delete the if in the main function then the program works without problem. Also if I give the command Data.Vector.last d in the ghci the value is calculated without problem. So why this message is printed? When I increase the stack size to 20M the program plays without problem. The test1 is the name of executable.

Thanks.


Solution

  • The problem is that your code is being too lazy when constructing d. Remember that Data.Vector.Vector is a boxed vector type - that is, it is represented internally as an array of pointers to heap objects (which are either values or unevaluated thunks). So when you're populating d with generate, you are actually creating a vector of thunks. In your example, when the thunk at position n is accessed, it triggers the evaluation of thunks at positions n-1 and n-2, which in turn triggers evaluation of thunks n-3, n-4, n-5 and so on. So evaluating the last element causes the previous 1000000 - 1 elements to be evaluated, causing the stack to grow. This is why you get the stack overflow error.

    An easy way to fix this without modifying your code is to fully evaluate the whole vector before accessing the last element. In that case all thunks are evaluated in order and there is no stack overflow (since once a thunk has been evaluated, it's replaced with the value of the expression it represented, so when you're evaluating element n after having already evaluated elements n-1 and n-2, only those two elements have to be accessed and the cascading evaluation of all previous thunks is not triggered):

    import Control.DeepSeq (($!!))
    ...
    let l = V.last $!! d
    ...
    

    Testing:

    $ ghc -O2 Test.hs
    [1 of 1] Compiling Main             ( Test.hs, Test.o )
    Linking Test ...
    $ ./Test 2222    
    [101,11,2]
    

    Alternatively, you can use unboxed vectors (flat arrays of Ints):

    d :: U.Vector Int
    d = U.create $ do
      v <- M.new dSize
      go 0 v
        where
          dSize = 1000000
          go i v | i >= dSize = return v
                 | otherwise = do
                   val <- case i of
                     0 -> return 2
                     1 -> return 3
                     2 -> return 5
                     _ -> if odd i
                          then (+2) <$> (M.read v (i-1))
                          else (+4) <$> (M.read v (i-1))
                   M.write v i val
                   go (i+1) v