html haskell html-parsing content-type non-ascii-characters

Why can Haskell not handle characters from a specific website?

I was wondering if I can write a Haskell program to check updates of some novels on demand, and the website I am using as an example is this. And I got a problem when displaying the contents of it (on a mac el capitan). The simple codes follow:

import Network.HTTP

openURL :: String -> IO String
openURL = (>>= getResponseBody) . simpleHTTP . getRequest

display :: String -> IO ()
display = (>>= putStrLn) . openURL

Then, when I run display "http://www.piaotian.net/html/7/7430/" on ghci, some strange characters appear; the first lines look like this:

<title>×ß½øÐÞÏÉ×îÐÂÕÂ½Ú,×ß½øÐÞÏÉÎÞµ¯´°È«ÎÄÔÄ¶Á_Æ®ÌìÎÄÑ§</title>
<meta http-equiv="Content-Type" content="text/html; charset=gbk" />
<meta name="keywords" content="×ß½øÐÞÏÉ,×ß½øÐÞÏÉ×îÐÂÕÂ½Ú,×ß½øÐÞÏÉÎÞµ¯´° Æ®ÌìÎÄÑ§" />
<meta name="description" content="Æ®ÌìÎÄÑ§ÍøÌá¹©×ß½øÐÞÏÉ×îÐÂÕÂ½ÚÃâ·ÑÔÄ¶Á£¬Çë½«×ß½øÐÞÏÉÕÂ½ÚÄ¿Â¼¼ÓÈëÊÕ²Ø·½±ãÏÂ´ÎÔÄ¶Á,Æ®ÌìÎÄÑ§Ð¡ËµÔÄ¶ÁÍø¾¡Á¦ÔÚµÚÒ»Ê±¼ä¸üÐÂÐ¡Ëµ×ß½øÐÞÏÉ£¬Èç·¢ÏÖÎ´¼°Ê±¸üÐÂ£¬ÇëÁªÏµÎÒÃÇ¡£" />
<meta name="copyright" content="×ß½øÐÞÏÉ°æÈ¨ÊôÓÚ×÷ÕßÎáµÀ³¤²»¹Â" />
<meta name="author" content="ÎáµÀ³¤²»¹Â" />
<link rel="stylesheet" href="/scripts/read/list.css" type="text/css" media="all" />
<script type="text/javascript">

I also tried to download as a file as follows:

import Network.HTTP

openURL :: String -> IO String
openURL = (>>= getResponseBody) . simpleHTTP . getRequest

downloading :: String -> IO ()
downloading = (>>= writeFile fileName) . openURL

But after downloading the file, it is like in the photo:

If I download the page by python (using urllib for example) the characters are displayed normally. Also, if I write a Chinese html and parse it, then there seems to be no problem. Thus it seems that the problem is on the website. However, I don't see any difference between the characters of the site and those I write.

Any help on the reason behind this is well appreciated.

P.S.
The python code is as follows:

import urllib

urllib.urlretrieve('http://www.piaotian.net/html/7/7430/', theFic)

theFic = file_path

And the file is all fine and good.

Solution

Since you said you are interested in just the links, there is no need to convert the GBK encoding to Unicode.

Here is a version which prints out all links like "123456.html" in the document:

#!/usr/bin/env stack
{- stack
  --resolver lts-6.0 --install-ghc runghc
  --package wreq --package lens
  --package tagsoup
-}

{-# LANGUAGE OverloadedStrings #-}

import Network.Wreq
import qualified Data.ByteString.Lazy.Char8 as LBS
import Control.Lens
import Text.HTML.TagSoup
import Data.Char
import Control.Monad

-- match \d+\.html
isNumberHtml lbs = (LBS.dropWhile isDigit lbs) == ".html"

wanted t = isTagOpenName "a" t && isNumberHtml (fromAttrib "href" t)

main = do
  r <- get "http://www.piaotian.net/html/7/7430/"
  let body = r ^. responseBody :: LBS.ByteString
      tags = parseTags body
      links = filter wanted tags
      hrefs = map (fromAttrib "href") links
  forM_ hrefs LBS.putStrLn