Search code examples
algorithmyoutubeslug

What is the algorithm of Youtube's for generating video slugs?


When we open a video on YouTube, we see that some random characters in the URL such as https://www.youtube.com/watch?v=cpp69ghR1IM.

Is there an algorithm for this job or it just creates random string and checks if its in the database or not? Since YouTube has huge amount of videos, wouldn't be so waste of time to check uniqueness of this generated random string?

Also, why YouTube doesnt use better slugs which generated by video title? For example: https://www.youtube.com/watch/Some-Dummy-Video-Title

Thanks in advance.


Solution

  • The 11-character base64 string is just an encoded long integer.

    It's hard to know for sure, but my suspicion is that they start with a sequential number and obfuscate it using something similar to the multiplicative inverse I describe in https://stackoverflow.com/a/34420445/56778. Then, they base64 encode the result.

    For a more detailed treatment, see my blog post, http://blog.mischel.com/2017/06/20/how-to-generate-random-looking-keys/.

    As for why they don't use better-looking slugs, you'd have to ask them. Some possibilities I came up with offhand.

    1. It's easy to ensure that their base64 encoded numbers are unique. Enforcing uniqueness of titles is difficult.
    2. They'd probably have to run some kind of "naughty word" filter on those nicer-looking titles. That's a surprisingly difficult problem.
    3. Makes editing of video titles more difficult.
    4. Sometimes the video titles contain garbage.
    5. The existing slugs are easy to generate, non-controversial, and nobody looks at them anyway. Why waste time on them?
    6. Because they've always done it that way.