Search code examples
phppythonurllibpercent-encoding

How to encode url using urllib


I have this php function that i'm in front of developping the same in python 2.7:

//PHP
$actionSLK = 'https://test.monsite.com/script.cgi';
$storeId = 'test';
$cartId = 'test2';
$totalAmountTx = '100';
$email = '[email protected]';
$SLKSecretKey = 'secret';

$dataMD5=$actionSLK . $storeId . $cartId . $totalAmountTx . $email . $SLKSecretKey
$checksum=MD5(utf8entities(rawurlencode($dataMD5)));

#PYTHON:
from hashlib import md5
import urllib

actionSLK = 'https://test.monsite.com/script.cgi'
storeId = 'test'
cartId = 'test2'
totalAmountTx = '100'
email = '[email protected]'
SLKSecretKey = 'secret'

dataMD5 = actionSLK + storeId + cartId + totalAmountTx + email + SLKSecretKey
checksum = md5(urllib.quote(dataMD5).encode('utf8')).hexdigest()

The problem that i found is the calculated checksum is not the same MD5, and then i have checked the encoded url (generated one: 'https://test.monsite.com/[email protected]'), and here we are:

//PHP
$checksum=MD5('https%3A%2F%2Ftest.monsite.com%2Fscript.cgitesttest100test%40monsite.comsecret');
#PYTHON
checksum = md5('https%3A//test.monsite.com/script.cgitesttest100test%40monsite.comsecret').hexdigest()

So the slash is not encoded so an error will be occured when generating differents checksum.

is there an other function in urllib that encode urls like this one in the detail ?


Solution

  • urllib.quote() is often used to encode urls parts including the path and therefore, by default, / is considered to be a safe character. Pass safe='' explicitly:

    >>> dataMD5
    'https://test.monsite.com/[email protected]'
    >>> import urllib
    >>> urllib.quote(dataMD5)
    'https%3A//test.monsite.com/script.cgitesttest2100test%40monsite.comsecret'
    >>> urllib.quote(dataMD5, safe='')
    'https%3A%2F%2Ftest.monsite.com%2Fscript.cgitesttest2100test%40monsite.comsecret'
    

    quote_plus() is usually used to create application/x-www-form-urlencoded data and therefore safe='' by default.

    To find out whether you should use quote_plus() or quote(), consider data with spaces:

    >>> urllib.quote_plus('/ /')
    '%2F+%2F'
    >>> urllib.quote('/ /', safe='')
    '%2F%20%2F'
    

    PHP's rawurlencode() produces the latter and therefore you should use quote(safe='') instead of quote_plus().