Search code examples
pythonpython-3.xpostgresqlpgcrypto

IndexError: tuple index out of range postgresql


I have been using the pgcrypto extension module's digest function to encode several values. I recently discovered that some of the URL values that I'm trying to encode contain '%,' which throw an

IndexError: tuple index out of range.

I have spent hours today trying to fix this issue, but so far I have not corrected this error in my code. How do I encode a URL that contains special characters?

This works in pgAdmin4, but not in my python script:

encode(digest('domainname.com/pub-cgi/retrieve.pl?doc=file%2F1999&zone_19=300%2A%20','sha256')

How do I encode a URL that contains special characters?


Solution

  • After doing some more research on Stack Overflow, I found a solution that was posted years ago.

    Decode escaped characters in URL

    This is the code that I used to solve my encoding problem:

    # This section of code reformats a href with URL encoding
    def unquote(url):
       return re.compile('%([0-9a-fA-F]{2})',re.M).sub(lambda m: chr(int(m.group(1),16)), url)
    
    # URL with encoding - https://www.somedomainname.com/pubs/retrieve.pl?doc=some%2Ddocument%2Dname.pdf
    
    print (unquote('https://www.somedomainname.com/pubs/retrieve.pl?doc=some%2Ddocument%2Dname.pdf'))
    
    # Output - https://www.somedomainname.com/pubs/retrieve.pl?doc=some-document-name.pdf
    

    Now that I have this URL reformatted, I can use the pgcrypto extension module's digest function to encode with a SHA-256 hash.

    encode(digest('https://www.somedomainname.com/pubs/retrieve.pl?doc=some-document-name.pdf','sha256')
    

    SPECIAL NOTE: I remove the href protocol from the URLs prior to hashing them, because it prevents duplicates, which is a concern of mine.