Search code examples
pythonpython-2.7uniqueidentifier

Creating a short unique ID based on other values in Python?


I have a number of variables in python that I want to use to generate a unique ID for those variables (yet have that ID always produce for those same matching variables).

I have used .encode('hex','strict') to produce an ID which seems to work, however the output value is very long. Is there a way to produce a shorter ID using variables?

myname = 'Midavalo'
mydate = '5 July 2017'
mytime = '8:19am'

codec = 'hex'

print "{}{}{}".format(myname, mydate, mytime).encode(codec,'strict')

This outputs

4d69646176616c6f35204a756c792032303137383a3139616d

I realise with hex it is probably dependant on the length of the three variables, so I'm wondering if there is another codec that can/will produce shorter values without excluding any of the variables?

So far I have tested base64, bz2, hex, quopri, uu, zip from 7.8.4. Python Specific Encodings, but I'm unsure how to get any of these to produce shorter values without removing variables.

Is there another codec I could use, or a way to shorten the values from any of them without removing the uniqueness, or even a completely different way to produce what I require?

All I am trying to do is produce an ID so I can identify those rows when loading them into a database. If the same value already exists it will not create a new row in the database. There is no security requirement, just a unique ID. The values are generated elsewhere into python, so I can't just use a database issued ID for these values.


Solution

  • You could use some hashing algorithm from the hashlib package: https://docs.python.org/3/library/hashlib.html or for python 2: https://docs.python.org/2.7/library/hashlib.html

    import hashlib
    s = "some string"
    hash = hashlib.sha1(str.encode(s)).hexdigest() # you need to encode the strings into bytes here
    

    This hash would be the same for the same string. Your choice of algorithm depends of the number of chars you want and the risk of collision(two different strings yielding the same hash).