I have some lovely (Scandinavian?) user on my website complaining that I cannot parse his username in URLs, and hence I am showing him no results on his page on my website.
I am pretty sure that the browser encodes the requests as http://councilroom.com/player?player=G%C3%B6rling
I'd like to get the player string to become Görling rather than Görling that is getting converted to.
I am using web.py with python 2.6 and attempting to parse the URL as follows
parsed_url = urlparse.urlparse(web.ctx.fullpath)
query_dict = dict(urlparse.parse_qsl(parsed_url.query))
target_player = query_dict['player']
Edit: With the help of unutbu, I fixed this by changing it to
query_dict = dict(urlparse.parse_qsl(web.ctx.env['QUERY_STRING']))
target_player = query_dict['player'].decode('utf-8')
I think webpy was mis-parsing the fullpath in web.ctx somehow, but the QUERY_STRING variable is unmolested.
In [4]: import urlparse
In [6]: parsed_url = urlparse.urlparse('http://councilroom.com/player?player=G%C3%B6rling')
In [7]: parsed_url
Out[7]: ParseResult(scheme='http', netloc='councilroom.com', path='/player', params='', query='player=G%C3%B6rling', fragment='')
In [8]: query_dict = dict(urlparse.parse_qsl(parsed_url.query))
In [9]: query_dict
Out[9]: {'player': 'G\xc3\xb6rling'}
Note the .decode('utf-8')
:
In [10]: target_player = query_dict['player'].decode('utf-8')
In [11]: target_player
Out[11]: u'G\xf6rling'
In [12]: print(target_player)
Görling
PS. Somehow, the bytes in the str
object 'G\xc3\xb6rling'
were being interpreted as a sequence of unicode code points, with the effect of turning Görling
into Görling
:
In [3]: print(u'G\xc3\xb6rling')
Görling