Search code examples
haskellreplacepattern-matchingbytestring

Replace newlines in ByteString


I'd like a function that takes a ByteString and replaces newlines \n and \n\r with commas, but can't think of a nice way to do it.

import qualified Data.ByteString as BS
import Data.Char (ord) 
import Data.Word (Word8)

endlWord8 = fromIntegral $ ord '\n' :: Word8

replace :: BS.ByteString -> BS.ByteString

I thought of using BS.map but can't see how since I can't pattern match on Word8's. Another option would be BS.split and then join with Word8 commas, but that sounds slow and inelegant. Any ideas?


Solution

  • Use Data.ByteString.Char8 to get rid of the nasty Word8, Char conversions you otherwise have to do. According to Data.ByteString.Char8 first sentence performance shouldn't be altered.

    Additionally use B.span instead of B.split as you want to replace also \n\r combinations and not only \n.

    My own (probably clumsy) attempt to do so:

    module Test where
    
    import Data.Monoid ((<>))
    import Data.ByteString.Char8 (ByteString)
    import qualified Data.ByteString.Char8 as B
    import qualified Data.ByteString.Builder as Build
    import qualified Data.ByteString.Lazy as LB
    
    eatNewline :: ByteString -> (Maybe Char, ByteString)
    eatNewline string
      | B.null string = (Nothing, string)
      | B.head string == '\n' && B.null (B.tail string) = (Just ',', B.empty)
      | B.head string == '\n' && B.head (B.tail string) /= '\r' = (Just ',', B.drop 1 string)
      | B.head string == '\n' && B.head (B.tail string) == '\r' = (Just ',', B.drop 2 string)
      | otherwise = (Nothing, string)
    
    replaceNewlines :: ByteString -> ByteString
    replaceNewlines = LB.toStrict . Build.toLazyByteString . go mempty
      where
        go :: Build.Builder -> ByteString -> Build.Builder
        go builder string = let (chunk, rest) = B.span (/= '\n') string
                                (c, rest1)    = eatNewline rest
                                maybeComma    = maybe mempty Build.char8 c
                            in if B.null rest1 then
                                 builder <> Build.byteString chunk <> maybeComma
                               else
                                 go (builder <> Build.byteString chunk <> maybeComma) rest1
    

    Hopefully the mappend for Data.ByteString.Builder isn't linear in the number of times mappend was already used for one of its operands, otherwise, there would be a quadratic alogrithm here.