Search code examples
haskellloggingiolazy-evaluation

How to output progress information in spite of Haskell's laziness?


Today I want Haskell to behave like any imperative language, look at this:

import Data.HashMap.Strict as HashMap
import Data.Text.IO
import Data.Text
import Data.Functor ((<&>))

putStr "Reading data from file ..."
ls <- lines <$> readFile myFile
putStrLn " done."

putStr "Processing data ..."
let hmap = HashMap.fromList $ ls <&> \l -> case splitOn " " l of
        [k, v] -> (k, v)
        _      -> error "expecting \"key value\""
putStrLn " done."

Basically, the user should know what the program is doing at the moment. The result of this code is the immediate output of

> Reading data from file ... done.
> Sorting data ... done.

... and then it starts doing the actual work, the output defeating its purpose.

I am well aware that it's a feature. Haskell is declarative and order of evaluation is determined by actual dependencies, not by line numbers in my .hs-file. Thus I try the following approach:

putStr "Reading data from file ..."
lines <- lines <$> readFile myFile
putStrLn $ lines `seq` " done."

putStr "Processing data ..."
let hmap = HashMap.fromList $ ls <&> \l -> case splitOn " " l of
        [k, v] -> (k, v)
        _      -> error "expecting \"key value\""
putStrLn $ hmap `seq` " done."

The idea: seq only returns once its first argument has been evaluated to Weak Head Normal Form. And it works, kind of. The output of my program is now nothing for a while and then, once the work as been done, all the IO occurs.

Is there a way out of this?


EDIT: I changed the question in reply to Ben's answer. The imports should now make more sense and the program really runs.

DanielWagner commented about this related question:

GHCi and compiled code seem to behave differently

which indeed solves my problem.

putStrLn $ hmap `seq` " done."

does exactly what it's supposed to. I am only missing flushing stdout. So this actually does what I need:

putStr "Reading data from file ..."
hFlush stdout -- from System.IO
lines <- lines <$> readFile myFile
putStrLn $ lines `seq` " done."

putStr "Processing data ..."
hFlush stdout
let hmap = HashMap.fromList $ ls <&> \l -> case splitOn " " l of
        [k, v] -> (k, v)
        _      -> error "expecting \"key value\""
putStrLn $ hmap `seq` " done."

Solution

  • You haven't given us the actual code that you say has this behaviour:

    The output of my program is now nothing for a while and then, once the work as been done, all the IO occurs.

    How do I know it's not the code you're running? Your code doesn't compile in order to be run at all! A few problems:

    1. You get a type error from lines, because it's in the standard Prelude but that version works on String, and you're working with Text.
    2. You haven't imported splitOn from anywhere
    3. The obvious splitOn to import is from Data.Text, but that has type Text -> Text -> [Text] i.e. it returns a list of Text splitting at all occurrences of the separator. You're obviously expecting a pair, splitting only on the first separator.

    So at the very minimum this is code you were running in ghci after more imports/definitions that you haven't shown us.

    Changing it as little as I could and get it to run gave me this:

    {-# LANGUAGE OverloadedStrings #-}
    
    import qualified Data.HashMap.Strict as HashMap
    import qualified Data.Text.IO as StrictIO
    import qualified Data.Text as Text
    
    myFile = "data.txt"
    
    main = do
      putStr "Reading data from file ..."
      lines <- Text.lines <$> StrictIO.readFile myFile
      putStrLn $ lines `seq` " done."
    
      putStr "Processing data ..."
      let hmap = HashMap.fromList $ Text.breakOn " " <$> lines
      putStrLn $ hmap `seq` " done."
    

    I generated a very simple data file with 5,000,000 lines and ran the program with runhaskell foo.hs, and there are in fact noticeable pauses between the appearance of the reading/processing messages and the "done" appearing on each line.

    I see no reason why all of the IO would be delayed appear at once (including the result of the first putStrLn. How are you actually running this code (or rather, the full and/or different code that actually runs)? In the post you've written it as input for GHCi rather than a full program (judging by the imports and IO statements at the same level, with no do block or definition of any top level functions). The only thought I had is that perhaps your data file is much smaller such that the processing takes a barely perceptible amount of time, and the initial startup processing of the Haskell code itself by ghci or runhaskell is the only noticeable delay; then I can imagine there being a slight delay followed by the printing of all the messages seemingly at once.