Search code examples
pythonweb.pyutf8-decodeurlparse

How to parse a utf-8 encoded query parameter with Python 2.6


I have some lovely (Scandinavian?) user on my website complaining that I cannot parse his username in URLs, and hence I am showing him no results on his page on my website.

I am pretty sure that the browser encodes the requests as http://councilroom.com/player?player=G%C3%B6rling

I'd like to get the player string to become Görling rather than Görling that is getting converted to.

I am using web.py with python 2.6 and attempting to parse the URL as follows

parsed_url = urlparse.urlparse(web.ctx.fullpath)
query_dict = dict(urlparse.parse_qsl(parsed_url.query))
target_player = query_dict['player']

Edit: With the help of unutbu, I fixed this by changing it to

query_dict = dict(urlparse.parse_qsl(web.ctx.env['QUERY_STRING']))
target_player = query_dict['player'].decode('utf-8')

I think webpy was mis-parsing the fullpath in web.ctx somehow, but the QUERY_STRING variable is unmolested.


Solution

  • In [4]: import urlparse
    
    In [6]: parsed_url = urlparse.urlparse('http://councilroom.com/player?player=G%C3%B6rling')
    
    In [7]: parsed_url
    Out[7]: ParseResult(scheme='http', netloc='councilroom.com', path='/player', params='', query='player=G%C3%B6rling', fragment='')
    
    In [8]: query_dict = dict(urlparse.parse_qsl(parsed_url.query))
    
    In [9]: query_dict
    Out[9]: {'player': 'G\xc3\xb6rling'}
    

    Note the .decode('utf-8'):

    In [10]: target_player = query_dict['player'].decode('utf-8')
    
    In [11]: target_player
    Out[11]: u'G\xf6rling'
    
    In [12]: print(target_player)
    Görling
    

    PS. Somehow, the bytes in the str object 'G\xc3\xb6rling' were being interpreted as a sequence of unicode code points, with the effect of turning Görling into Görling:

    In [3]: print(u'G\xc3\xb6rling')
    Görling