Today I want Haskell to behave like any imperative language, look at this:
import Data.HashMap.Strict as HashMap
import Data.Text.IO
import Data.Text
import Data.Functor ((<&>))
putStr "Reading data from file ..."
ls <- lines <$> readFile myFile
putStrLn " done."
putStr "Processing data ..."
let hmap = HashMap.fromList $ ls <&> \l -> case splitOn " " l of
[k, v] -> (k, v)
_ -> error "expecting \"key value\""
putStrLn " done."
Basically, the user should know what the program is doing at the moment. The result of this code is the immediate output of
> Reading data from file ... done.
> Sorting data ... done.
... and then it starts doing the actual work, the output defeating its purpose.
I am well aware that it's a feature. Haskell is declarative and order of evaluation is determined by actual dependencies, not by line numbers in my .hs-file. Thus I try the following approach:
putStr "Reading data from file ..."
lines <- lines <$> readFile myFile
putStrLn $ lines `seq` " done."
putStr "Processing data ..."
let hmap = HashMap.fromList $ ls <&> \l -> case splitOn " " l of
[k, v] -> (k, v)
_ -> error "expecting \"key value\""
putStrLn $ hmap `seq` " done."
The idea: seq
only returns once its first argument has been evaluated to Weak Head Normal Form. And it works, kind of. The output of my program is now nothing for a while and then, once the work as been done, all the IO occurs.
Is there a way out of this?
EDIT: I changed the question in reply to Ben's answer. The imports should now make more sense and the program really runs.
DanielWagner commented about this related question:
GHCi and compiled code seem to behave differently
which indeed solves my problem.
putStrLn $ hmap `seq` " done."
does exactly what it's supposed to. I am only missing flushing stdout. So this actually does what I need:
putStr "Reading data from file ..."
hFlush stdout -- from System.IO
lines <- lines <$> readFile myFile
putStrLn $ lines `seq` " done."
putStr "Processing data ..."
hFlush stdout
let hmap = HashMap.fromList $ ls <&> \l -> case splitOn " " l of
[k, v] -> (k, v)
_ -> error "expecting \"key value\""
putStrLn $ hmap `seq` " done."
You haven't given us the actual code that you say has this behaviour:
The output of my program is now nothing for a while and then, once the work as been done, all the IO occurs.
How do I know it's not the code you're running? Your code doesn't compile in order to be run at all! A few problems:
lines
, because it's in the standard Prelude
but that version works on String
, and you're working with Text
.splitOn
from anywheresplitOn
to import is from Data.Text
, but that has type Text -> Text -> [Text]
i.e. it returns a list of Text
splitting at all occurrences of the separator. You're obviously expecting a pair, splitting only on the first separator.So at the very minimum this is code you were running in ghci
after more imports/definitions that you haven't shown us.
Changing it as little as I could and get it to run gave me this:
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.HashMap.Strict as HashMap
import qualified Data.Text.IO as StrictIO
import qualified Data.Text as Text
myFile = "data.txt"
main = do
putStr "Reading data from file ..."
lines <- Text.lines <$> StrictIO.readFile myFile
putStrLn $ lines `seq` " done."
putStr "Processing data ..."
let hmap = HashMap.fromList $ Text.breakOn " " <$> lines
putStrLn $ hmap `seq` " done."
I generated a very simple data file with 5,000,000 lines and ran the program with runhaskell foo.hs
, and there are in fact noticeable pauses between the appearance of the reading/processing messages and the "done" appearing on each line.
I see no reason why all of the IO would be delayed appear at once (including the result of the first putStrLn
. How are you actually running this code (or rather, the full and/or different code that actually runs)? In the post you've written it as input for GHCi rather than a full program (judging by the imports and IO
statements at the same level, with no do
block or definition of any top level functions). The only thought I had is that perhaps your data file is much smaller such that the processing takes a barely perceptible amount of time, and the initial startup processing of the Haskell code itself by ghci
or runhaskell
is the only noticeable delay; then I can imagine there being a slight delay followed by the printing of all the messages seemingly at once.