Search code examples
haskellbytestringattoparsec

Does Data.Attoparsec.ByteString use "zero copy"ing?


Take for example takeWhile. Internally it uses span.

Does that mean it just references the input bytestring? Probably not, if so, is there a way to achieve this?

The motivating usecase is a large (>2gb) file that I want to map into memory and extract bytestrings pointing into the mapped memory.


Solution

  • Yes, all substring-like operations on ByteStrings are O(1), as you can see in the documentation, and make a shallow copy with a different offset/length. If you don't need this, use copy to get a full copy of parsed results, so the original huge string could be garbage-collected.

    Additionally, consider Lazy incarnations of mmap and Attoparsec, it may probably be more optimal in case of a consecutive parsing of the large chunk.