Search code examples
haskellfile-ioiohtml-parsinglazy-evaluation

Haskel: how to force evaluation of functions and write to a file sequentially?


I have a problem with lazy IO in Haskell. Despite reading other questions in that field, I couldn't figure out how to solve my specific case.

I'm using the scalpel package to parse html. The usecase is simple: One site contains links to other sites which describe some kind of event. So I wrote the following structures (I left out some of the implementations here):

type Url = String

-- function that parses all urls
allUrls :: Url -> IO (Maybe [Url])

data Event = Event { ... }

-- function that parses an event
parseEvent :: Url -> IO (Maybe Event)

-- function that writes the event to a file
doThings :: Url -> IO ()
doThings url = return url >>= parseEvent >>= (appendFile "/tmp/foo.txt" . show)

-- function that should take all urls and write their events to a file
allEvents :: IO (Maybe [Url]) -> IO (Maybe (IO [()]))
allEvents urls = urls >>= return . liftM (mapM doThings)

-- or alternatively:

-- function that takes all urls and returns all events
allEvents :: IO (Maybe [Url]) -> IO (Maybe (IO [Maybe Event]))
allEvents urls = urls >>= return . liftM (mapM parseEvent)

-- some function that writes all events to a file
allEventsToFile :: IO (Maybe (IO [Maybe Event])) -> IO()
??? 

The doThings function works as expected. Given a url, it parses the corresponding event and writes it to the file. But allEvents does absolutely nothing because of laziness. How can I force the evaluation inside allEvents?


Solution

  • This is not a problem of lazy IO. Lazy IO is when you read a lazy string from a file, but don't evaluate it – the runtime will in this case defer the actual reading until you evaluate it.

    The problem is actually that you don't do any IO in allEvents – you're merely shoving around values in the IO functor. Those values happen to be IO actions themselves, but that doesn't matter. Specifally, a >>= return . f is always the same as just fmap f a, by the monad laws. And fmapping in IO does not bind actions.

    This problem is already observed in the type signature: -> IO (Maybe (IO [()])) says that the function yields IO actions that you could then later execute. But in this case, you want to execute everything when you execute allEvents. So the signature could be

    allEvents :: IO (Maybe [Url]) -> IO ()
    

    (or perhaps -> IO (Either EventExecError ()), if you want to properly handle failure).

    This is probably still not what you want: why do you take an IO action as the argument? That means allEvents would itself need to execute that action to first fetch the URLs, before doing any work of its own. That could have its own side-effects and give different results for different calls, do you want that?

    I guess not, so really it should be

    allEvents :: Maybe [Url] -> IO ()
    

    Now you start out with a plain Maybe value, which you can easily pattern-match on:

    allEvents Nothing = ?  -- perhaps simply `return ()`
    allEvents (Just urls) = mapM_ doThings urls
    

    To then use that in your program, you need to monadically bind the url-fetching to the event-executing:

    main :: IO ()
    main = do
      urlq <- allUrls
      allEvents urlq
    

    ...or short allUrls >>= allEvents.