Search code examples
haskellbytestring

Converting ByteString Generated by System.Entropy to Text


When I run this code, I get a decode error from Data.Text. What am I doing wrong?

import Data.Text                    (Text, pack, unpack)
import Data.Text.Encoding           (decodeUtf8)
import Data.ByteString              (ByteString)
import System.Entropy

randBS :: IO ByteString 
randBS = do
    randBytes <- getEntropy 2048  
    return randBytes

main :: IO ()
main = do
    r <- randBS
    putStrLn $ unpack $ decodeUtf8 r 

Runtime Error:

Cannot decode byte '\xc4': Data.Text.Internal.Encoding.Fusion.streamUtf8:
Invalid UTF-8 stream

I would like to generate some random bytes that will be used as an auth token.

I am on Mac OS X (Yosemite) and GHC Version 7.10.1


Solution

  • randBS returns random bytes not utf-8 encoded data! What you have is not a representation of Text so it doesn't matter which function you use you will encounter some decoding error, and so you'll have to use something like decodeUtf8With and use an error handler to replace invalid bytes with their literal counterpart.

    Something like:

    import Data.Text                    (Text, pack, unpack)
    import Data.Text.Encoding           (decodeUtf8With)
    import Data.ByteString              (ByteString)
    import Data.Char                    (chr)
    import Control.Applicative          ((<$>))
    import System.Entropy
    
    handler _ x = chr <$> fromIntegral <$> x
    
    randBS :: IO ByteString 
    randBS = do
        randBytes <- getEntropy 2048  
        return randBytes
    
    main :: IO ()
    main = do
        r <- randBS
        putStrLn $ unpack $ decodeUtf8With handler r 
    

    Not tested, in this moment I don't have GHC installed :s


    Probably even better is to simply use hexadecimal encoding instead of utf-8 + error handler. You can do so with the base16-bytestring library. So you'd first use the encode :: ByteString -> ByteString to obtain a representation with only ASCII values:

    import Data.Text                    (Text, pack, unpack)
    import Data.ByteString              (ByteString)
    import Data.ByteString.Encoding     (decodeUtf8)
    import Data.ByteString.Base16       (encode)
    import System.Entropy
    
    --- ... randBS as before
    
    main = do
        r <- randBS
        putStrLn $ unpack $ decodeUtf8 $ encode r