Search code examples
phpmysqlprimary-keyuniquebase36

Creating unique non guessable base 36 id


For an application similar to URL shortener services, I want to create non guessable id's, which you're all familiar with I presume. Here's an example of such an id:

http://example.com/sd23t9

What is a good and efficient technique to generate these, with minimal (or no) risk of collision when inserting these as a primary key in a database table?

EDIT:
Piskvor makes an excellent point of course. I should have mentioned that I meant minimal collision risk before the 36^6 limit is met.

EDIT 2
Eh, scrap that, his point was illustrating much more than that of course. Hmmm. Pregenerating a table with id's then, perhaps (like I've read elsewhere already)? Would that be the most efficient technique when i'm bound to the 36^6 and none-consecutive constraints perhaps?


Solution

  • Set ID length. // e.g. 6
    do {
      Generate a short random ID of the given length
      Collision?
       - No:
          DONE!
       - Yes:
          increase ID length
     } while true
    

    For any finite ID length, there's always a risk of collision: assuming from your example that you'd have [a-z0-9]{6} IDs, as soon as you have 2,176,782,336 IDs, a collision is 100% guaranteed (no more available keys). Due to birthday problem, you'll get collisions much, much sooner. With such a small keyspace, there isn't a way to avoid collisions - you'll need to have some sort of collision recovery instead.

    You could generate an ID until it doesn't collide - but this will get progressively slower as the keyspace is being filled: imagine a keyspace of [a-z], with [a-n] and [p-z] already taken - now every new random ID is more likely to collide than not; and when you completely fill the keyspace, the loop will not terminate at all.

    The algorithm I propose is maybe overcautious in this regard: if it finds a collision, it will try progressively longer IDs (as it assumes "collision => not feasible to check the shorter keyspace"). Although it's not efficient, it's likely to find a non-colliding ID within a few iterations.