Search code examples
haskell-pipes

using haskell pipes-bytestring to iterate a file by line


I am using the pipes library and need to convert a ByteString stream to a stream of lines (i.e. String), using ASCII encoding. I am aware that there are other libraries (Pipes.Text and Pipes.Prelude) that perhaps let me yield lines from a text file more easily, but because of some other code I need to be able to get lines as String from a Producer of ByteString.

More formally, I need to convert a Producer ByteString IO () to a Producer String IO (), which yields lines.

I am sure this must be a one-liner for an experienced Pipes-Programmer, but I so far did not manage to successfully hack through all the FreeT and Lens-trickery in Pipes-ByteString.

Any help is much appreciated!

Stephan


Solution

  • If you need that type signature, then I would suggest this:

    import Control.Foldl (mconcat, purely)
    import Data.ByteString (ByteString)
    import Data.Text (unpack)
    import Lens.Family (view)
    import Pipes (Producer, (>->))
    import Pipes.Group (folds)
    import qualified Pipes.Prelude as Pipes
    import Pipes.Text (lines)
    import Pipes.Text.Encoding (utf8)
    import Prelude hiding (lines)
    
    getLines
        :: Producer ByteString IO r -> Producer String IO (Producer ByteString IO r)
    getLines p = purely folds mconcat (view (utf8 . lines) p) >-> Pipes.map unpack
    

    This works because the type of purely folds mconcat is:

    purely folds mconcat
        :: (Monad m, Monoid t) => FreeT (Producer t m) r -> Producer t m r
    

    ... where t in this case would be Text:

    purely folds mconcat
        :: Monad m => FreeT (Producer Text m) r -> Producer Text m r
    

    Any time you want to reduce each Producer sub-group of a FreeT-delimited stream you probably want to use purely folds. Then it's just a matter of picking the right Fold to reduce the sub-group with. In this case, you just want to concatenate all the Text chunks within a group, so you pass in mconcat. I generally don't recommend doing this since it will break on extremely long lines, but you specified that you needed this behavior.

    The reason this is verbose is because the pipes ecosystem promotes Text over String and also tries to encourage handling arbitrarily long lines. If you were not constrained by your other code then the more idiomatic approach would just be:

    view (utf8 . lines)