Search code examples
djangourldjango-modelsprimary-key

Generating a non-sequential ID/PK for a Django Model


I'm on the cusp of starting work on a new webapp. Part of this will give users pages that they can customise in a one to many relationship. These pages naturally need to have unique URLs.

Left to its own devices, Django would normally assign a standard AUTOINCREMENT ID to a model. While this works fantastically, it doesn't look great and it also makes pages very predictable (something that isn't desired in this case).

Rather than 1, 2, 3, 4 I would like set-length, randomly generated alphanumeric strings (eg h2esj4). 6 spots of a possible set of 36 characters should give me over two billion combinations which should be more than enough at this stage. Of course if I could expand this at a later time, that would be good too.

But there are two issues:

  1. Random strings occasionally spell out bad words or other offensive phrases. Is there a decent way to sidestep that? To be fair I could probably settle for a numeric string but it does have a hefty hit on the likelihood of clashes.

  2. How do I get Django (or the database) to do the heavy lifting on insert? I'd rather not insert and then work out the key (as that wouldn't be much of a key). I assume there are concurrency issues to be aware of too though if two new pages were generated at the same time and the second (against all odds) magically got the same key as the first before the first was committed.

I don't see this being a million miles different from how URL shorteners generate their IDs. If there's a decent Django implementation of one, I could piggyback off that.


Just in case you think this isn't a real problem, we can explore how common this really is together, with some code.

This downloads a naughty word list, and then generates 1000 [a-z0-9]{6} strings, and counts how many contain one of these bad words.

import random, string, json, urllib.request

wordlist: str = "https://raw.githubusercontent.com/zacanger/profane-words/refs/heads/master/words.json"
swear_words: list[str] = json.loads(urllib.request.urlopen(wordlist).read())
swear_words = [sw for sw in swear_words if len(sw) <= 6]

population = string.ascii_lowercase + string.digits
bad_count: int = 0

for _ in range(1_000):
    pick = ''.join(random.choices(population=population, k=6))
    if any(sw in pick for sw in swear_words):
        bad_count += 1
        continue

print(bad_count)

For statistical juice, I cranked this up to 1,000,000 and got 8823 bad strings back. That's 1-in-111 results and real-life results could be worse…

  • Longer wordlists? Is FAULTY what you want your product identified as?!
  • Different character populations, eg [a-zA-Z]{6} give 1-in-40 bad results.
  • Using longer IDs? [a-zA-Z0-9]{8} renders 1-in-25 bad results here!

Solution

  • Here's what I ended up doing. I made an abstract model. My use-case for this is needing several models that generate their own, random slugs.

    A slug looks like AA##AA so that's 52x52x10x10x52x52 = 731,161,600 combinations. Probably a thousand times more than I'll need and if that's ever an issue, I can add a letter for 52 times more combinations.

    Use of the default argument wouldn't cut it as the abstract model needs to check for slug collisions on the child. Inheritance was the easiest, possibly only way of doing that.

    from django.db import models
    from django.contrib.auth.models import User
    
    import string, random
    
    class SluggedModel(models.Model):
        slug = models.SlugField(primary_key=True, unique=True, editable=False, blank=True)
    
        def save(self, *args, **kwargs):
            while not self.slug:
                newslug = ''.join([
                    random.sample(string.letters, 2),
                    random.sample(string.digits, 2),
                    random.sample(string.letters, 2),
                ])
    
                if not self.objects.filter(pk=newslug).exists():
                    self.slug = newslug
    
            super().save(*args, **kwargs)
    
        class Meta:
            abstract = True