With the following code, I want to serialize a Data.Text value to a ByteString. Unfortunately my text is prepended with unnecessary NUL bytes and an EOT byte:
GHCi, version 9.4.4: https://www.haskell.org/ghc/ :? for help
ghci> import qualified Data.Text as T
ghci> import Data.Binary
ghci> import Data.Binary.Put
ghci> let txt = T.pack "Text"
ghci> runPut $ put txt
PS: I the real code I put the length in front of the text
foo :: Text -> ByteString
foo txt = runPut do
putWord32host $ T.length txt
put txt
It actually already encodes the length in the binary string. Indeed, if we look at the source code, for the Text
instance of Binary
, we see [src]:
instance Binary Text where put t = put (encodeUtf8 t) get = do bs <- get case decodeUtf8' bs of P.Left exn -> P.fail (P.show exn) P.Right a -> P.return a
That's not much of a surprise, we encode it to UTF-8 which produces a ByteString
, and then use put
on that one. But the length is added when we put
the ByteString
itself. Indeed, the BinaryString
instance of Binary
looks like [src]:
instance Binary B.ByteString where put bs = put (B.length bs) <> putByteString bs get = get >>= getByteString
The put
for the ByteString
produced by encodeUtf8
thus writes eight bytes to specify the size of the ByteString
, this is thus the number of bytes, not (per se the same as) the number of characters in the Text
If you would want the same effect, but without the length prefix, you can use:
import Data.Text.Encoding
runPut (putByteString (encodeUtf8 txt))
this thus omits the length header.