Search code examples
haskelllazy-evaluationstrictbytestringchunking

Convert a Lazy ByteString to a strict ByteString


I have a function that takes a lazy ByteString, that I wish to have return lists of strict ByteStrings (the laziness should be transferred to the list type of the output).

import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
csVals :: L.ByteString -> [B.ByteString]

I want to do this for various reasons, several lexing functions require strict ByteStrings, and I can guarantee the outputted strict ByteStrings in the output of csVals above are very small.

How do I go about "strictifying" ByteStrings without chunking them?

Update0

I want to take a Lazy ByteString, and make one strict ByteString containing all its data.


Solution

  • Like @sclv said in the comments above, a lazy bytestring is just a list of strict bytestrings. There are two approaches to converting lazy ByteString to strict (source: haskell mailing list discussion about adding toStrict function) - relevant code from the email thread below:

    First, relevant libraries:

    import qualified Data.ByteString               as B
    import qualified Data.ByteString.Internal      as BI
    import qualified Data.ByteString.Lazy          as BL
    import qualified Data.ByteString.Lazy.Internal as BLI
    import           Foreign.ForeignPtr
    import           Foreign.Ptr
    

    Approach 1 (same as @sclv):

    toStrict1 :: BL.ByteString -> B.ByteString
    toStrict1 = B.concat . BL.toChunks
    

    Approach 2:

    toStrict2 :: BL.ByteString -> B.ByteString
    toStrict2 BLI.Empty = B.empty
    toStrict2 (BLI.Chunk c BLI.Empty) = c
    toStrict2 lb = BI.unsafeCreate len $ go lb
      where
        len = BLI.foldlChunks (\l sb -> l + B.length sb) 0 lb
    
        go  BLI.Empty                   _   = return ()
        go (BLI.Chunk (BI.PS fp s l) r) ptr =
            withForeignPtr fp $ \p -> do
                BI.memcpy ptr (p `plusPtr` s) (fromIntegral l)
                go r (ptr `plusPtr` l)
    

    If performance is a concern, I recommend checking out the email thread above. It has criterion benchmark as well. toStrict2 is faster than toStrict1 in those benchmarks.