Search code examples
securityglobalization

Alphabet to use for international validation code


We are looking to generate a validation code to be used by end customer to gain access and/or validate various scenarios around the world. The code would be generated on our servers and then transferred and shown in an app but the code may need be transferred to other users/persons manually, by email, maybe even by phone. An example of such a code could be ABC123.

We are looking to find an "alphabet" for the characters in such a validation code that balances some concerns:

  • Alphabet must be recognizable around the world which may limit the letters/numbers/symbols that we can use
  • Code must be able to be entered in e.g. an app on the customers phone, so would need work with local keyboards
  • Since code may be transferred by phone or similar analog means, a shorter code would be preferable to avoid mistakes
  • Some codes may be valid for a longer period of time, so we would like to have a broad enough alphabet and long enough code that "brute force" attacks are not an attack vector
  • We would likely prefer case-insensitive codes to avoid confusion in analog transferrals

Our initial idea was just an e.g. 9-digit number but there are some concerns around brute force attacks and the business side would ideally want an even shorter code. Moving to e.g. A-Z and 1-9 (excluding some characters that are easily mixed such as "O" and 0), the alphabet size grows to 20+ and a shorter code would be possible, but how about international users and their phone keyboards?

Summary: We would like some feedback, thoughts or even existing standards for how to generate a globally acceptable, easily entered and transferred yet still secure code from an alphabet that is accessible across the globe


Solution

  • I can't imagine a scenario where a user pretty much anywhere around the world cannot enter an alphanumeric code (ie. letters and numbers, maybe with some letters omitted for clarity as you described). They would then be unable to use most of the internet in general.

    As for the length, you should consider the entropy of your code. For reasonably secure codes, you should have something like 32 bits of entropy (depends a lot on your anticipated brute force power, see below), while very secure would be 128 or even 256+ bits.

    From a character set of 20 characters and a length of 9, the entropy (if and only if generated correctly) is 38.9 bits (log2(20^9)), so that might be ok. But only you can tell how fast of a bruteforce you are expecting, which is largely affected by whether it's possible to do offline bruteforce or these will be online requests.

    A length of 9 also looks good from a ux perspective, because for the user you can visually group it as 3x3 characters, easy to say or interpret.

    Again, if fast offline bruteforce is possible in your case (you might consider specialized hardware as well, if that is a potential threat), this less than 39 bits might be insufficient.

    A code from an alphabet of 20 and with a length of 9 would have 20^9 = 5.12 * 10^11 possible values. With only 1 million guesses a second, it would take around 6 days to do a comprehensive brute force, but in about half of that there would be good chance to find the right code. So this 39 bits of entrpy is not a lot, and something like specialized hardware (also depending on the algorithm) will probably be a lot faster than that, if offline is possible. However if it's a webservice, and this is used for a second factor of authentication in some way, or a time limited one-time password or token, this can very well be enough. So in short, it depends on your exact scenario, the above is a line of thought you can follow to figure out whether it's good enough for you.