Search code examples
haskellconduit

Haskell streaming download


The two resources I found that suggested recipes for streaming downloads using popular Haskell libraries were:

How would I modify the code in the former to (a) save to file, and (b) print only a (take 5) of the byte response, rather than the whole response to stdout?

My attempt at (b) is:

#!/usr/bin/env stack
{- stack --install-ghc --resolver lts-5.13 runghc
   --package http-conduit
 -}
{-# LANGUAGE OverloadedStrings #-}
import           Control.Monad.IO.Class (liftIO)
import qualified Data.ByteString        as S
import qualified Data.Conduit.List      as CL
import           Network.HTTP.Simple
import           System.IO              (stdout)

main :: IO ()
main = httpSink "http://httpbin.org/get" $ \response -> do
    liftIO $ putStrLn
           $ "The status code was: "
          ++ show (getResponseStatusCode response)

    CL.mapM_ (take 5) (S.hPut stdout)

Which fails to map the (take 5), and suggests to me among other things I still don't understand how mapping over monads works, or liftIO.

Also, this resource:

http://haskelliseasy.readthedocs.io/en/latest/#note-on-streaming

...gave me a warning, "I know what I'm doing and I'd like more fine-grained control over resources, such as streaming" that this not easily or generally supported.

Other places I looked:

If there's anything in the Haskellverse that makes this easier, more like Python's requests:

response = requests.get(URL, stream=True)
for i,chunk in enumerate(response.iter_content(BLOCK)):
  f.write(chunk)

I'd appreciate the tip there, too, or pointers towards the 2016 state of the art.


Solution

  • You are probably looking for httpSource from the latest version of http-conduit. It behaves pretty much exactly like Python's requests: you get back a stream of chunks.

    save to file

    This is easy, just redirect the source straight into a file sink.

    #!/usr/bin/env stack
    {- stack --install-ghc --resolver nightly-2016-11-26 runghc --package http-conduit -}
    
    {-# LANGUAGE OverloadedStrings #-}
    import Network.HTTP.Simple (httpSource, getResponseBody)
    import Conduit
    
    main = runConduitRes $ httpSource "http://httpbin.org/get" getResponseBody
                        .| sinkFile "data_file"
    

    print only a (take 5) of the byte response

    Once we have the source, we take the first 5 bytes with takeCE 5 and then print these via printC.

    #!/usr/bin/env stack
    {- stack --install-ghc --resolver nightly-2016-11-26 runghc --package http-conduit -}
    
    {-# LANGUAGE OverloadedStrings #-}
    import Network.HTTP.Simple (httpSource, getResponseBody)
    import Data.ByteString (unpack)
    import Conduit
    
    main = runConduitRes $ httpSource "http://httpbin.org/get" getResponseBody
                        .| takeCE 5
                        .| printC
    

    save to file and print only a (take 5) of the byte response

    To do this, you want zipSinks or, for more general cases that involve zipping multiple sinks ZipSink:

    #!/usr/bin/env stack
    {- stack --install-ghc --resolver nightly-2016-11-26 runghc --package http-conduit -}
    
    {-# LANGUAGE OverloadedStrings #-}
    import Network.HTTP.Simple (httpSource, getResponseBody)
    import Data.ByteString (unpack)
    import Data.Conduit.Internal (zipSinks)
    import Conduit
    
    main = runConduitRes $ httpSource "http://httpbin.org/get" getResponseBody
                        .| zipSinks (takeCE 5 .| printC)
                                    (sinkFile "data_file")