Search code examples
performancehaskellghc

Haskell speed issue where executing both parts of a program takes significantly longer than either part alone


I have a Haskell program which has 2 lines of code in the main:

putStrLn $ "Day11: part1: " ++ show (sum $ bigManhattan 1 galaxies <$> pairs)
putStrLn $ "Day11: part2: " ++ show (sum $ bigManhattan 999999 galaxies <$> pairs)

If I comment out either one of them the program runs in 0.01s. With both of them extant the program takes 90s.

I wonder if anyone has any ideas? Could they be competing to look at the data and getting in each others way?

Compiler options - I don't know what most of these do...

ghc-options:
- -Wall
- -Wcompat
- -Widentities
- -Wincomplete-record-updates
- -Wincomplete-uni-patterns
- -Wmissing-export-lists
- -Wmissing-home-modules
- -Wpartial-fields
- -Wredundant-constraints
- -O2

The code:

module Day11(day11) where

import Data.List ((\\))
import Data.Maybe (catMaybes)


type Coord = (Int, Int)


manhattan :: Coord -> Coord -> Int
manhattan (x1, y1) (x2, y2) = abs (x1 - x2) + abs (y1 - y2)


getF :: (String -> a) -> Int -> IO a
getF f n = do
  s <- readFile $ "./Data/Day" ++ show n ++ ".in"
  return $ f s


getLines :: Int -> IO [String]
getLines = getF lines


parse :: [String] -> [Coord]
parse css = concatMap (catMaybes . (\(y, cs) -> (\(x, c) -> if c=='#' then Just (x,y) else Nothing) <$> zip [0..] cs)) (zip [0..] css)


nSize :: Int
nSize = 140


bigManhattan :: Int -> [Coord] -> (Coord, Coord) -> Int
bigManhattan k galaxies ((c1, r1), (c2, r2)) = manhattan (c1+newc1, r1+newr1) (c2+newc2, r2+newr2)
  where
    baseC, baseR :: [Int]
    baseC = [0..(nSize-1)] \\ (fst <$> galaxies)
    baseR = [0..(nSize-1)] \\  (snd <$> galaxies)
    newc1, newc2, newr1, newr2 :: Int
    newc1 = k * length (filter (c1>) baseC)
    newc2 = k * length (filter (c2>) baseC)
    newr1 = k * length (filter (r1>) baseR)
    newr2 = k * length (filter (r2>) baseR)


day11 :: IO ()
day11 = do
  ls <- getLines 11
  let galaxies = parse ls
      pairs = [(x,y) | x <- galaxies, y <- galaxies, x<y ]

  putStrLn $ "Day11: part1: " ++ show (sum $ bigManhattan 1 galaxies <$> pairs)
  --putStrLn $ "Day11: part2: " ++ show (sum $ bigManhattan 999999 galaxies <$> pairs)
  return ()

Using GHC 9.4.7.


Solution

  • Note that baseC and baseR depend on galaxies but not the pairs. When you print only one of the two results, GHC inlines all of bigManhattan and is able to lift baseC and baseR out of the sum. But, when you print both results, the inlining is suppressed, and baseC and baseR get recalculated for every bigManhattan call on every element of pairs. The simplest way to work around this is to move the sum and map into bigManhattan to ensure baseC and baseR only get calculated once per sum. You could do even better, since they don't depend on k either, but this seems to work fast enough.

    bigManhattan :: Int -> [Coord] -> [(Coord, Coord)] -> Int
    bigManhattan k galaxies = sum . map go
      where
        baseC, baseR :: [Int]
        baseC = [0..(nSize-1)] \\ (fst <$> galaxies)
        baseR = [0..(nSize-1)] \\  (snd <$> galaxies)
    
        go ((c1,r1),(c2,r2)) = manhattan (c1+newc1, r1+newr1) (c2+newc2, r2+newr2)
          where newc1, newc2, newr1, newr2 :: Int
                newc1 = k * length (filter (c1>) baseC)
                newc2 = k * length (filter (c2>) baseC)
                newr1 = k * length (filter (r1>) baseR)
                newr2 = k * length (filter (r2>) baseR)
    
    
    day11 :: IO ()
    day11 = do
      ls <- getLines 11
      let galaxies = parse ls
          pairs = [(x,y) | x <- galaxies, y <- galaxies, x<y ]
    
      putStrLn $ "Day11: part1: " ++ show (bigManhattan 1 galaxies pairs)
      putStrLn $ "Day11: part2: " ++ show (bigManhattan 999999 galaxies pairs)
      return ()