Search code examples
haskellunicodeutf-8charparsec

Parsec output on unicode (UTF-8) char


Just need to understand something related to Parsec.

parseTest (many1 alphaNum) "re2re1Δ"
"re2re1\916"
:t parseTest (many1 alphaNum) 
parseTest (many1 alphaNum) :: Text.Parsec.Prim.Stream s Data.Functor.Identity.Identity Char =>
 s -> IO ()

So, the output of the Unicode (should be UTF-8, since I am on OSX) is printed as the hex (?) code (should be the greek delta character). Now, the putChar does not make the same conversion inside the same ghci session (and the same terminal)

Text.Parsec.Char> putChar 'Δ'
Δ

How come? They should both be just 'Char' types somehow...?


Solution

  • The reason here has got to do with the way show and putChar are implemented.

    λ> show "re2re1Δ"
    "\"re2re1\\916\""
    λ> mapM_ putChar "re2re1Δ"
    re2re1Δ
    

    From the source you can see that Show instance for Char is defined like this:

    instance  Show Char  where
        showsPrec _ '\'' = showString "'\\''"
        showsPrec _ c    = showChar '\'' . showLitChar c . showChar '\''
    
        showList cs = showChar '"' . showl cs
                     where showl ""       s = showChar '"' s
                           showl ('"':xs) s = showString "\\\"" (showl xs s)
                           showl (x:xs)   s = showLitChar x (showl xs s)
    

    putChar is implemented like this:

    putChar         :: Char -> IO ()
    putChar c       =  hPutChar stdout c
    

    The parseTest function is internally using the print function which itself internally uses show and that's why you are getting the Unicode codepoint value for delta.