Search code examples
stringtexthaskellcsvbytestring

Split ByteString on a ByteString (instead of a Word8 or Char)


I know I already have the Haskell Data.ByteString.Lazy function to split a CSV on a single character, such as:

split :: Word8 -> ByteString -> [ByteString]

But I want to split on a multi-character ByteString (like splitting on a String instead of a Char):

split :: ByteString -> ByteString -> [ByteString]

I have multi-character separators in a csv-like text file that I need to parse, and the individual characters themselves appear in some of the fields, so choosing just one separator character and discarding the others would contaminate the data import.

I've had some ideas on how to do this, but they seem kind of hacky (e.g. take three Word8s, test if they're the separator combination, start a new field if they are, recurse further), and I imagine I would be reinventing a wheel anyway. Is there a way to do this without rebuilding the function from scratch?


Solution

  • The documentation of Bytestrings breakSubstring contains a function that does what you are asking for:

    tokenise x y = h : if null t then [] else tokenise x (drop (length x) t)
        where (h,t) = breakSubstring x y