Search code examples

HandsomeSoup URL Fetch Issues When URL Contains Square Brackets

The following code does not work when the URL contains []. Could someone tel me how to escape it? I have tried encoding the entire URL but that did not help.

import qualified Data.ByteString.Char8 as B
import Network.HTTP.Types.URI
import Text.HandsomeSoup
import Text.XML.HXT.Core

main = do
    -- This URL works (no [])
    let url = ""
    --let urlEnc = B.unpack $ urlEncode True $ B.pack url
    doc <- fromUrl url
    links <- runX $ doc >>> css "a" ! "href"
    mapM_ putStrLn links

If I change the URL to below (note the [0]), it does not work.

    let url = "[0]=hello"

By working, I mean I see a list of links when there is no [] but I get nothing inside ghci (goes to the prompt) when there is [] in the URL. If you copy the second URL and paste it in your browswer, it works just fine, however.

I made this example up to illustrate the issue with []. Of course, it is not a valid URL given by Google or anything.

GHC: 7.6.2, Mac OS X (Mavericks)



  • I guess the urlEncode does nothing to [ and ] because it should not, they are not special in general.

    You can however encode them using %5B and %5D (source), so an alternative would be to manually replace [ and ] in your url.