I am learning Haskell but having some difficulty understanding how exactly lazy ByteStrings work. Hackage says that "Lazy ByteStrings use a lazy list of strict chunks which makes it suitable for I/O streaming tasks". In contrast, a strict list is stored as one large array.
What are these "chunks" in lazy byteStrings? How does your compiler know just how large a chunk should be? Further, I understand that the idea behind a lazy list is that you don't have to store the entire thing, which thus allows for infinite lists and all of that. But how is this storage implemented? Does each chunk have a pointer to a next chunk?
Many thanks in advance for the help :)
You can find the definition of the lazy ByteString
here:
data ByteString = Empty | Chunk {-# UNPACK #-} !S.ByteString ByteString
deriving (Typeable)
so Chunk
is one data-constructor - the first part is a strict (!
) strict (S.
) ByteString
and then some more Chunks
or Empty
via the second recursive (lazy) ByteString
part.
Note that the second part does not have the (!)
there - so this can be a GHC thunk (the lazy stuff in Haskell) that will only be forced when you need it (for example pattern-match on it).
That means a lazy ByteString
is either Empty
or you get a strict (you can think of this as already loaded if you want) part or chunk of the complete string with a lazy remaining/rest/tail ByteString
.
As about the size that depends on the code that is generating this lazy bytestring - the compiler does not come into this.
You can see this for hGetContents
:
hGetContents = hGetContentsN defaultChunkSize
where defaultChunkSize
is defined to be 32 * 1024 - 2 * sizeOf (undefined :: Int)
- so a bit less than 32kB
And yes the rest (snd. argument to Chunk
) can be seen as a pointer to the next Chunk
or Empty
(just like with a normal list).