I have been using the pgcrypto extension module's digest function to encode several values. I recently discovered that some of the URL values that I'm trying to encode contain '%,'
which throw an
IndexError: tuple index out of range.
I have spent hours today trying to fix this issue, but so far I have not corrected this error in my code. How do I encode a URL
that contains special characters?
This works in pgAdmin4
, but not in my python script:
encode(digest('domainname.com/pub-cgi/retrieve.pl?doc=file%2F1999&zone_19=300%2A%20','sha256')
How do I encode a URL that contains special characters?
After doing some more research on Stack Overflow, I found a solution that was posted years ago.
Decode escaped characters in URL
This is the code that I used to solve my encoding problem:
# This section of code reformats a href with URL encoding
def unquote(url):
return re.compile('%([0-9a-fA-F]{2})',re.M).sub(lambda m: chr(int(m.group(1),16)), url)
# URL with encoding - https://www.somedomainname.com/pubs/retrieve.pl?doc=some%2Ddocument%2Dname.pdf
print (unquote('https://www.somedomainname.com/pubs/retrieve.pl?doc=some%2Ddocument%2Dname.pdf'))
# Output - https://www.somedomainname.com/pubs/retrieve.pl?doc=some-document-name.pdf
Now that I have this URL reformatted, I can use the pgcrypto extension module's digest function to encode with a SHA-256 hash.
encode(digest('https://www.somedomainname.com/pubs/retrieve.pl?doc=some-document-name.pdf','sha256')
SPECIAL NOTE: I remove the href protocol from the URLs prior to hashing them, because it prevents duplicates, which is a concern of mine.