Search code examples
haskelltextencodingbytestring

How to approach writing of custom decoding function from `ByteString` to `Text`


Suppose I wish to write something like this:

-- | Decode a 'ByteString' containing Code Page 437 encoded text.

decodeCP437 :: ByteString -> Text
decodeCP437 = undefined

(I know about encoding package, but its dependency list is ridiculous price to pay for this single, and I believe quite trivial function.)

My question is how to construct Text from ByteString with reasonable efficiency, in particular without using lists. It seems to me that Data.Text.Encoding should be a good source for inspiration, but at first sight it uses withForeignPtr and I guess it's too low level for my use case.

How the problem should be approached? In a nutshell, I guess I need to continuously take bytes (Word8) from ByteString, translate every byte to corresponding Char, and somehow efficiently build Text from them. Complexity of basic building functions in Data.Text for Text construction not surprisingly indicates that appending characters one by one is not the best idea, but I don't see better tools for this available.


Update: I want to create strict Text. It seems that the only option is to create builder then get lazy Text from it (O(n)) and then convert to strict Text (O(n)).


Solution

  • You can use the Builder API, which offers O(1) singleton :: Char -> Builder and O(1) (<>) :: Builder -> Builder -> Builder for efficient construction operations.