Search code examples
performancehaskelliobuffering

How to speed Haskell IO with buffering?


I read about IO buffering in the "Real World Haskell" (ch. 7, p. 189), and tried to test, how different buffering size affects the performance.

import System.IO
import Data.Time.Clock
import Data.Char(toUpper)

main :: IO ()
main = do
  hInp <- openFile "bigFile.txt" ReadMode
  let bufferSize = truncate $ 2**10
  hSetBuffering hInp (BlockBuffering (Just bufferSize))
  bufferMode <- hGetBuffering hInp
  putStrLn $ "Current buffering mode: " ++ (show bufferMode)

  startTime <- getCurrentTime
  inp <- hGetContents hInp
  writeFile "processed.txt" (map toUpper inp)
  hClose hInp
  finishTime <- getCurrentTime
  print $ diffUTCTime finishTime startTime
  return ()

Then I created a "bigFile.txt"

-rw-rw-r-- 1 user user 96M янв.  26 09:49 bigFile.txt

and run my program against this file, with different buffer size:

Current buffering mode: BlockBuffering (Just 32)
9.744967s   

Current buffering mode: BlockBuffering (Just 1024)
9.667924s                                      

Current buffering mode: BlockBuffering (Just 1048576)
9.494807s    

Current buffering mode: BlockBuffering (Just 1073741824)
9.792453s   

But the program running time is almost the same. Is it normal, or I'm doing something wrong?


Solution

  • On a modern OS it is likely that the buffer size has little effect on reading a file linearly due to 1) read-ahead performed by the kernel and 2) the file might already be in the page cache if you have already read the file recently.

    Here is a program which measures the effect of buffering on writes. Typical results are:

    $ ./mkbigfile 32      -- 12.864733s
    $ ./mkbigfile 64      --  9.668272s
    $ ./mkbigfile 128     --  6.993664s
    $ ./mkbigfile 512     --  4.130989s
    $ ./mkbigfile 1024    --  3.536652s
    $ ./mkbigfile 16384   --  3.055403s
    $ ./mkbigfile 1000000 --  3.004879s
    

    Source:

    {-# LANGUAGE OverloadedStrings #-}
    
    import qualified Data.ByteString as BS
    import Data.ByteString (ByteString)
    import Control.Monad
    import System.IO
    import System.Environment
    import Data.Time.Clock
    
    main = do
      (arg:_) <- getArgs
      let size = read arg
      let bs = "abcdefghijklmnopqrstuvwxyz"
          n = 96000000 `div` (length bs)
      h <- openFile "bigFile.txt" WriteMode
      hSetBuffering h (BlockBuffering (Just size))
      startTime <- getCurrentTime
      replicateM_ n $ hPutStrLn h bs
      hClose h
      finishTime <- getCurrentTime
      print $ diffUTCTime finishTime startTime
      return ()