Share Link Generation Security

I have always wondered how websites generates "share with others" links. Some websites allow you to share a piece of data through a link in order to let people you sent the link to to be able to see the data or even edit it. For example Google Drive, OneDrive, etc... They give you a (pretty short) link, but what guaranties me that it's not possible for someone to find this link "by luck" and access my data? Like what if an attacker was trying all the possibilities of links: https://link.share.me/xxxxxxx till he finds some working ones?

Is there a certain length which almost guaranties that no one will find one link this way ? For example if a site generated 1000 links, if the endpoints are composed of 10 times a [A-Za-z0-9] like character (~8e17 possibilities), we just assume that it is secure enough ? If yes, at what probability or ratio between links and possibilities do we consider this kind of system as secure?
Is there a certain cryptographic or mathematic way of generating those links which assure us that a link cannot be found by anyone?

Thank you very much.

Solution

Probably the most important thing (besides entropy, which we will come back to in a second) is where you get random from. For this purpose you should use a cryptographic pseudo-random number generator (crypto prng). (As a sidenote, you could also use real random, but a real random source is very hard to come by, if you generate many links, you will likely run out of available random bits, so a crypto prng is probably good enough for your purpose, few applications do actually need real random numbers). Most languages and/or frameworks have a facility for this, in Ruby it is SecureRandom, in Java it's java.security.SecureRandom for example, in python it could be os.urandom and so on.

Ok so how long should it be. It somewhat depends on your other non-security requirements as well, for example sometimes these need to be easy to say over the phone, easy to type or something similar. Apart from these, what you should consider is entropy. Your idea of counting the number of all possible codes is a great start, let's just say that the entropy in the code is log2 (base 2 logarithm) of that number. So for a case sensitive, alphanumeric code that is 10 characters long, the entropy is log2((26+26+10)^10) = 59.5 bits. You can compute the entropy for any other length and character set the same way.

That might well be enough, what you should consider is your attacker. Will they be able to perform online attacks only (a lot slower), or offline too (can be very-very fast, especially with specialized hardware)? Also what is the impact if they find one, is it like financial data, or just a random funny picture, or the personal data of somebody, for which you are legally responsible in multiple jurisdictions (see GDPR in EU, or the California privacy laws)?

In general, you could say that 64 bits of entropy is probably good enough for many purposes, and 128 bits is a lot (except maybe for cryptographic keys and very high security applications). As the 59 bits above is.. well, almost 64, for lower security apps that could for example be a reasonable tradeoff for better usability.

So in short, there is no definitive answer, it depends on how you want to model this, and what security requirements you want to meet.

Two more things to consider are the validity of these codes, and how many will be issued (how dense will the space be).

I think the usual variables here are the character set for the code, and its length. Validity is more like a business requirement, and the density of codes will depend on your usage and also the length (which defines the size of your code space).

As an example, let's say you have 64 bits of entropy, you issued 10 million codes already, and your attacker can only perform online attacks by sending a request to your server, at a rate of say 100/second. These are likely huge overstatements towards the secure side.

That would mean there is a 0.17% chance somebody could find a valid code in a year. But would your attacker put so much effort into finding one single (random) valid code? Whether that's acceptable for you only depends on your specific case, only you can tell. If not, you can increase the length of the code for example.