Search code examples
pythonencryptionhashpycrypto

Encrypt a string in Python. Restrict the characters used to only alphanumeric


I would like to encrypt a 10 Character (alpha-numeric only) string into a 16 or 32 character alpha-numeric string.

The string I am encrypting is an asset tag. So in itself it carries no information, but I would like to hide all valid possible strings within a larger group of possible strings. I was hoping that encrypting the string would be a good way to do this.

Is it possible to do this with the Python PyCrypto library?

Here is an example I found regarding using PyCrypto.


Solution

  • You're better off with simple hashing (which is like one way encryption). To do this just use the md5 function to make a digest and then base64 or base16 encode it. Please note that base64 strings can include +, = or /.

    import md5
    import base64
    
    def obfuscate(s):
        return base64.b64encode( md5.new(s).digest())
    
    def obfuscate2(s):
        return base64.b16encode( md5.new(s).digest())
    
    # returns alphanumeric string but strings can also include slash, plus or equal i.e. /+=
    print obfuscate('Tag 1')
    print obfuscate('Tag 2')
    print obfuscate('Tag 3')
    
    # return hex string
    print obfuscate2('Tag 1')
    

    As has been commented md5 is rapidly losing its security, so if you want to have something more reliable for the future, use the SHA-2 example below.

    import hashlib
    
    def obfuscate(s):
        m = hashlib.sha256()
        m.update(s)
        return m.hexdigest()
    
    print obfuscate('Tag 1')
    print obfuscate('Tag 2')
    print obfuscate('Tag 3')
    

    One more function - this time generate about 96-bit* digest using SHA-2 and truncating the output so that we can restrict it to 16 alphanum chars. This give slightly more chance of collision but should be good enough for most practical purposes.

    import hashlib
    import base64
    
    def obfuscate(s):
        m = hashlib.sha256()
        m.update(s)
        hash = base64.b64encode(m.digest(), altchars="ZZ")  # make one way base64 encode, to fit characters into alphanum space only
        return hash[:16]    # cut of hash at 16 chars - gives about 96 bits which should 
        # 96 bits means 1 in billion chance of collision if you have 1 billion tags (or much lower chance with fewer tags)
        # http://en.wikipedia.org/wiki/Birthday_attack
    
    print obfuscate('Tag 1')
    print obfuscate('Tag 2')
    print obfuscate('Tag 3')
    

    *The actual digest is only 95.2 bits as we use 62 character alphabet for encoding.

    >>> math.log(62**16,2)
    95.26714096618998