Search code examples
haskellbenchmarkinghaskell-criterion

Benchmarking IO action with Criterion


I want to know how long it takes my program to read a 12.9MB .wav file into memory. The function that reads a file into memory looks as follows:

import qualified Data.ByteString        as BS

getSamplesFromFileAsBS :: FilePath -> IO (BS.ByteString)

It takes the name of the file and returns the samples as a ByteString. It also performs some other validity checks on the data and ignores the header information. I read the ByteString of samples into memory using ByteString.hGet.

If I now benchmark this function with a 12.9MB file, using Criterion:

bencher :: FilePath -> IO ()
bencher fp = defaultMain [
  bench "Reading all the samples from a file." $ nfIO (getSamplesFromFileAsBS fp)
  ]

I get the following result:

benchmarking Reading all the samples from a file.
time                 3.617 ms   (3.520 ms .. 3.730 ms)
                     0.989 R²   (0.981 R² .. 0.994 R²)
mean                 3.760 ms   (3.662 ms .. 3.875 ms)
std dev              354.0 μs   (259.9 μs .. 552.5 μs)
variance introduced by outliers: 62% (severely inflated)

It seems to load 12.9MB into memory in 3.617ms. This doesn't seem realistic since it indicates that my SSD can read 3+GB/s, which is not the case at all. What am I doing wrong?

I decided to try this another (more naive) way, by manually measuring the time difference:

runBenchmarks :: FilePath -> IO ()
runBenchmarks fp = do
  start <- getCurrentTime
  samplesBS <- getSamplesFromFileAsBS fp
  end <- samplesBS `deepseq` getCurrentTime
  print (diffUTCTime end start)

This gives me the following result: 0.023105s. This is realistic because it would mean my SSD can read at a speed of around 600MB/s. What is wrong with the Criterion result?


Solution

  • I looked at the visual results of my Criterion benchmark by writing the output to a html file. I could clearly see that the first run took around 0.020s, whereas the rest (after caching) takes around 0.003s.

    So I'm getting these results because of caching.