Search code examples
haskellbytestring

Difference between Data.ByteString and Data.ByteString.Char8


I read that Char8 only supports ASCII characters and will be dangerous to use if you are using other Unicode characters

{-# LANGUAGE OverloadedStrings #-}

--import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC
import qualified Data.Text.IO as TIO
import qualified Data.Text.Encoding as E
import qualified Data.Text as T

name :: T.Text
name = "{ \"name\": \"哈时刻\" }"

nameB :: BC.ByteString
nameB = E.encodeUtf8 name

main :: IO ()
main = do
  BC.writeFile "test.json" nameB
  putStrLn "done"

produces the same result as

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString as B
--import qualified Data.ByteString.Char8 as BC
import qualified Data.Text.IO as TIO
import qualified Data.Text.Encoding as E
import qualified Data.Text as T

name :: T.Text
name = "{ \"name\": \"哈时刻\" }"

nameB :: B.ByteString
nameB = E.encodeUtf8 name

main :: IO ()
main = do
  B.writeFile "test.json" nameB
  putStrLn "done"

So what is the difference of using Data.ByteString.Char8 vs Data.ByteString


Solution

  • If you compare Data.ByteString and Data.ByteString.Char8, you'll notice that a bunch of functions that reference Word8 in the former reference Char in the latter.

    -- Data.ByteString
    map :: (Word8 -> Word8) -> ByteString -> ByteString
    cons :: Word8 -> ByteString -> ByteString
    snoc :: ByteString -> Word8 -> ByteString
    head :: ByteString -> Word8
    uncons :: ByteString -> Maybe (Word8, ByteString) 
    {- and so on... -}
    
    
    -- Data.ByteString.Char8
    map :: (Char -> Char) -> ByteString -> ByteString
    cons :: Char -> ByteString -> ByteString
    snoc :: ByteString -> Char -> ByteString
    head :: ByteString -> Char
    uncons :: ByteString -> Maybe (Char, ByteString) 
    {- and so on... -}
    

    For these functions, and these functions only, Data.ByteString.Char8 is providing the convenience of not have to constantly convert Word8 values into and out of Char ones. writeFile does exactly the same thing in both modules.

    Here is a nice way of seeing the different behaviours of similar functions in Text, ByteString, and ByteString.Char8:

    {-# LANGUAGE OverloadedStrings #-}
    
    import Data.Text.Encoding
    
    import qualified Data.Text as T
    import qualified Data.ByteString as B
    import qualified Data.ByteString.Char8 as BC
    
    nameText :: T.Text
    nameText = "哈时刻"
    
    nameByteString :: B.ByteString
    nameByteString = encodeUtf8 nameText
    
    main :: IO ()
    main = do
      print $ T.head nameText               -- '\21704'     actual first character
      print $ B.head nameByteString         -- 229          first byte
      print $ BC.head nameByteString        -- '\299'       first byte as character
    
      putStrLn [ T.head nameText ]          -- 哈           actual first character
      putStrLn [ BC.head nameByteString ]   -- å            first byte as character