Search code examples
pythonunicodeutf-8iodecode

python decoding Non-English to use as url?


I have a variable such as title:

title = "révolution_essentielle"

I could encode and decode it like this for other purposes:

title1 = unicode(title, encoding = "utf-8")

But how do I preserve the Non-English and use it as part of a url string to access the url? For instance, I ideally want https://mainurl.com/révolution_essentielle.html by concatenating several strings including title like this:

url = main_url + "/" + title + ".html"

Could anyone kindly show me how to do that? Thanks a bunch!


Solution

  • To summarize what we've talked about in the comments: there is a function for quoting URLs (replacing special characters with % prefix escape sequences.

    For Python 2 (as used in this case), it's urllib.quote(), which can be used as follows:

    urllib.quote("révolution_essentielle")
    

    When our input is an unicode object with wide characters, we need to also encode it first, e.g.:

    urllib.quote(u'hey_there_who_likes_lego_that\xe3\u019\xe2_\xe3_...'.encode('utf8')).
    

    Be ware though so that your representation matches the one expected/understood by the counterpart machine.


    If we were talking Python 3, the equivalent function would be urllib.parse.quote():

    urllib.parse.quote("révolution_essentielle")
    

    Which can chew over str (unicode) parameters as well as encoded value in bytes object.