Search code examples
haskellutf-8hxt

Encode strings parsed by HXT to proper UTF8 String


I am parsing utf8-encoded pages using hxt, simplified parser example:

names = multi (hasName "h1") >>> proc h1 do
  name <- getText <<< getChildren -< h1
  returnA name

Everything goes normal until I try to print names:

*Main > n
"\208\152\208\182\208\190\209\128\208\176-\208\161"
*Main > :t n
n :: String
*Main > putStrLn n
ÐжоÑа-С
*Main > Data.Text.IO.putStrLn $ Data.Text.pack n
ÐжоÑа-С

I am parsing using option withInputEncoding "utf8". How is it possible to properly encode string parsed by hxt?


Solution

  • Use decodeUtf8 from Data.Text.Encoding in combination with pack from Data.ByteString.Char8.pack:

    *Main > import Data.Text.Encoding as E
    *Main > import Data.ByteString.Char8 as C
    *Main > import Data.Text.IO as T
    
    *Main > T.putStrLn . E.decodeUtf8 . C.pack $ n
    
    Ижора-С