Search code examples
phpdatabaseperformancealgorithmurl-shortener

URL Shortener algorithm


I recently bought myself a domain for personal URL shortening. And I created a function to generate alphanumeric strings of 4 characters as reference.

BUT

How do I check if they are already used or not? I can't check for every URL if it exists in the database, or is this just the way it works and I have to do it? If so, what if I have 13.000.000 URLs generated (out of 14.776.336). Do I need to keep generating strings until I have found one that is not in the DB yet?

This just doesn't look the right way to do it, anyone who can give me some advise?


Solution

  • One memory efficient and faster way I think of is following. This problem can be solved without use of database at all. The idea is that instead of storing used urls in database, you can store them in memory. And since storing them in memory can take a lot of memory usage, so we will use a bit set (an array of bits ) and we only one bit for each url.

    1. For each random string you generate, create a hashcode for that that lies b/w 0 and max number K.
    2. Create a bit set( basically a bit array). Whenever you use some url, set corresponding hash code bit in bit set to 1.
    3. Whenever you generate a new url, see if its hashcode bit is set. If yes, then discard that url and generate a new one. Repeat the process till you get one unused one.

    This way you avoid DB forever, your lookups are extremely fast and it takes least amount of memory.

    I borrowed the idea from this place