Search code examples
haskelllazy-evaluationbytestring

What makes a Bytestring "lazy"?


I am learning Haskell but having some difficulty understanding how exactly lazy ByteStrings work. Hackage says that "Lazy ByteStrings use a lazy list of strict chunks which makes it suitable for I/O streaming tasks". In contrast, a strict list is stored as one large array.

What are these "chunks" in lazy byteStrings? How does your compiler know just how large a chunk should be? Further, I understand that the idea behind a lazy list is that you don't have to store the entire thing, which thus allows for infinite lists and all of that. But how is this storage implemented? Does each chunk have a pointer to a next chunk?

Many thanks in advance for the help :)


Solution

  • You can find the definition of the lazy ByteString here:

    data ByteString = Empty | Chunk {-# UNPACK #-} !S.ByteString ByteString
        deriving (Typeable)
    

    so Chunk is one data-constructor - the first part is a strict (!) strict (S.) ByteString and then some more Chunks or Empty via the second recursive (lazy) ByteString part.

    Note that the second part does not have the (!) there - so this can be a GHC thunk (the lazy stuff in Haskell) that will only be forced when you need it (for example pattern-match on it).

    That means a lazy ByteString is either Empty or you get a strict (you can think of this as already loaded if you want) part or chunk of the complete string with a lazy remaining/rest/tail ByteString.

    As about the size that depends on the code that is generating this lazy bytestring - the compiler does not come into this.

    You can see this for hGetContents:

    hGetContents = hGetContentsN defaultChunkSize
    

    where defaultChunkSize is defined to be 32 * 1024 - 2 * sizeOf (undefined :: Int) - so a bit less than 32kB

    And yes the rest (snd. argument to Chunk) can be seen as a pointer to the next Chunk or Empty (just like with a normal list).