Haskell: make a Writer as efficient as normal code when log is not needed

I'd like to write one code that could be run in two "modes":

either in logging mode, i.e. it should log some informations (in my case I want to log the number of calls done on some particular functions at a given time)
or in efficient mode, i.e. it does not log anything but just runs as fast as possible

I tried to write the following code, which creates two Writers, one normal one (for the logging mode) and one stupid one (that does not record anything, for the efficient mode). I then define a new class LogFunctionCalls that allows me to run my function in one of these two Writers.

However, I tried to compare the speed of the code using the Stupid writer, and it's significantly slower than the normal code without writer: here is the profiling informations:

code without writer: total time = 0.27s, total alloc = 55,800 bytes
code with stupid writer StupidLogEntry: total time = 0.74 s, total alloc = 600,060,408 bytes (NB: the real time is much bigger than 0.74s...)
code with real writer LogEntry: total time = 5.03 s, total alloc = 1,920,060,624 bytes

Here is the code (you can comment depending on which run you want to use):

{-# LANGUAGE ScopedTypeVariables #-}
module Main where

--- It depends on the transformers, containers, and base packages.

--- You can profile it with:
--- $ cabal v2-run --enable-profiling debug -- +RTS -p
--- and a file debug.prof will be created.

import qualified Data.Map.Strict as MapStrict
import qualified Data.Map.Merge.Strict as MapMerge

import qualified Control.Monad as CM
import Control.Monad.Trans.Writer.Strict (Writer)
import qualified Control.Monad.Trans.Writer.Strict as Wr
import qualified Data.Time as Time

-- Test using writer monad

-- The actual LogEntry, that should associate a number
-- to each name
newtype LogEntry = LogEntry { logMap:: MapStrict.Map String Int }
  deriving (Eq, Show)

-- A logentry that does not record anything, always empty
newtype StupidLogEntry = StupidLogEntry { stupidLogMap:: MapStrict.Map String Int }
  deriving (Eq, Show)

-- Create the Monoid instances
instance Semigroup LogEntry where
  (LogEntry m1) <> (LogEntry m2) =
    LogEntry $ MapStrict.unionWith (+) m1 m2
instance Monoid LogEntry where
  mempty = LogEntry MapStrict.empty

instance Semigroup StupidLogEntry where
  (StupidLogEntry m1) <> (StupidLogEntry m2) =
    StupidLogEntry $ m1
instance Monoid StupidLogEntry where
  mempty = StupidLogEntry MapStrict.empty

-- Create a class that allows me to use the function "myTell"
-- that adds a number in the writer (either the LogEntry
-- or StupidLogEntry one)
class (Monoid r) => LogFunctionCalls r where
  myTell :: String -> Int -> Writer r ()

instance LogFunctionCalls LogEntry where
  myTell namefunction n = do
    Wr.tell $ LogEntry $ MapStrict.singleton namefunction n

instance LogFunctionCalls StupidLogEntry where
  myTell namefunction n = do
    -- Wr.tell $ StupidLogEntry $ Map.singleton namefunction n
    return ()

-- Function in itself, with writers
countNumberCalls :: (LogFunctionCalls r) => Int -> Writer r Int
countNumberCalls 0 = return 0
countNumberCalls n = do
  myTell "countNumberCalls" 1
  x <- countNumberCalls $ n - 1
  return $ 1 + x

--- Without any writer, pretty efficient
countNumberCallsNoWriter :: Int -> Int
countNumberCallsNoWriter 0 = 0
countNumberCallsNoWriter n = 1 + countNumberCallsNoWriter (n-1)

main :: IO ()
main = do
  putStrLn $ "Hello"
  -- Version without any writter
  print =<< Time.getZonedTime
  let n = countNumberCallsNoWriter 15000000
  putStrLn $ "Without any writer, the result is " ++ (show n)
  -- Version with Logger
  print =<< Time.getZonedTime
  let (n, log :: LogEntry) = Wr.runWriter $ countNumberCalls 15000000
  putStrLn $ "The result is " ++ (show n)
  putStrLn $ "With the logger, the number of calls is " ++ (show $ (logMap log))
  -- Version with the stupid logger
  print =<< Time.getZonedTime
  let (n, log :: StupidLogEntry) = Wr.runWriter $ countNumberCalls 15000000
  putStrLn $ "The result is " ++ (show n)
  putStrLn $ "With the stupid logger, the number of calls is " ++ (show $ (stupidLogMap log))
  print =<< Time.getZonedTime

Solution

The Writer monad is the bottleneck. A better way to generalize your code so it can run in those two "modes" is to change the interface, i.e., the LogFunctionCalls class, to be parameterized by the monad:

class Monad m => LogFunctionCalls m where
  myTell :: String -> Int -> m ()

Then we can use an identity monad (or monad transformer) to implement it trivially:

newtype NoLog a = NoLog a
  deriving (Functor, Applicative, Monad) via Identity

instance LogFunctionCalls NoLog where
  myTell _ _ = pure ()

Note also that the function to test has a different type now, that no longer refers to Writer explicitly:

countNumberCalls :: (LogFunctionCalls m) => Int -> m Int

Let's stick it in a benchmark, which has all kinds of methodological issues as pointed out in the comments, but still, something interesting happens if we compile it with ghc -O:

main :: IO ()
main = do
  let iternumber = 1500000
  putStrLn $ "Hello"
  t0 <- Time.getCurrentTime

  -- Non-monadic version
  let n = countNumberCallsNoWriter iternumber
  putStrLn $ "Without any writer, the result is " ++ (show n)
  t1 <- Time.getCurrentTime
  print (Time.diffUTCTime t1 t0)

  -- NoLog version
  let n = unNoLog $ countNumberCalls iternumber
  putStrLn $ "The result is " ++ (show n)
  t2 <- Time.getCurrentTime
  print (Time.diffUTCTime t2 t1)

The output:

Hello
Without any writer, the result is 1500000
0.022030957s
The result is 1500000
0.000081533s

As we can see, the second version (the one we care about) took zero time. If we remove the first version from the benchmark, then the remaining one will take the 0.022s of the former.

So GHC actually optimized one of the two benchmarks away because it saw that they are the same, which achieves what we originally wanted: the "logging" code runs as fast as specialized code without logging because they're literally the same, and the benchmark numbers don't matter.

This can also be confirmed by looking at the generated Core; run ghc -O -ddump-simpl -ddump-to-file -dsuppres-all and make sense of the file Main.dump-simpl. Or use inspection-testing.

Compilable gist: https://gist.github.com/Lysxia/2f98c4a8a61034dcc614de5e95d7d5f8