Search code examples
pythonurl-parsingredaction

Selectively redact variable-length key from URL using Python


I need to use Python to redact a variable-length key from a URL string. All but the last four characters of the key are to be redacted. The last four characters of the key are to intentionally remain unredacted for identification purposes. The character set of the key is ASCII alphanumeric. The URL must otherwise remain unaffected. The character used for redaction () is unicodedata.lookup("FULL BLOCK").

Example input: https://example.com/data?bar=irish&key=dc3e966e4c57effb0cc7137dec7d39ac.

Example output: https://example.com/data?bar=irish&key=████████████████████████████39ac.

I am using Python 3.8. There exists a different question which deals with redacting a password at a different location in the URL and it doesn't help me.

I tried a simple regex substitution but it worked only with a fixed length key whereas I have a variable length key.


Solution

  • A flexible way to do this is using a regular expression substitution with a replacement function. The regex uses non-matching positive lookbehind and lookahead assertions.

    import re
    import unicodedata
    
    _REGEX = re.compile(r"(?<=\Wkey=)(?P<redacted>\w+)(?=\w{4})")
    _REPL_CHAR = unicodedata.lookup("FULL BLOCK")
    
    
    def redact_key(url: str) -> str:
        # Ref: https://stackoverflow.com/a/59971629/
        return _REGEX.sub(lambda match: len(match.groupdict()["redacted"]) * _REPL_CHAR, url)
    
    

    Test:

    redact_key('https://example.com/data?bar=irish&key=dc3e966e4c57effb0cc7137dec7d39ac')
    'https://example.com/data?bar=irish&key=████████████████████████████39ac'
    
    >>> redact_key('https://example.com/data?key=dc3e966e4c57effb0cc7137dec7d39ac')
    'https://example.com/data?key=████████████████████████████39ac'
    
    >>> redact_key('https://example.com/data?bar=irish&key=dc3e966e4c57effb0cc7137dec7d39ac&baz=qux')
    'https://example.com/data?bar=irish&key=████████████████████████████39ac&baz=qux'
    
    >>> redact_key('https://example.com/data?bar=irish&baz=qux')
    'https://example.com/data?bar=irish&baz=qux'