Search code examples
phptoken

How to create unique tokens correctly in php?


I'm new to php and I'm studying on my own, normally I create my tokens and insert them in the tables like this:

private function create_token($reference, $bytes, $slice)
 {
   $key = substr(preg_replace('/\W/', "", base64_encode(bin2hex(random_bytes($bytes)))), 0, $slice);
   return $reference . $key;
 }

function create_token('token_B8', 34, 22); //token_B8eEr32EEddDsfSDGRGgHHhg

This maybe is a correct way to create tokens but my doubt would be if this really is the correct way, I was thinking, obviously the chanses of there being 2 tokens in the identical table and from 1 to 1000000000 correct? Or is there a way to create a token that says: Under no circumstances create an equal token without having to create a function to check if the token in the table already exists.

I believe that what I should do is create a token the way I create it is to create a function that checks if this token already exists in the table, if it does, it generates another token, if not then insert it in the table. This seems to be a correct way, but as I'm new I don't know if there is a more appropriate way, can someone get me out of this doubt? thanks


Solution

  • The string generated by random_bytes() is maximally random, and literally everything you do to it after that is decreasing the amount of randomness in the string, and therefore the number of possible values that it could be.

    1. random_bytes() 8 bits of random per byte.
    2. bin2hex() stretches each byte of input over two bytes. [x0.5]
    3. base64_encode() stretches 3 input bytes over 4 output bytes. [x0.75]
    4. preg_replace('/\W/', "", $input) effectively changing from base64 encoding to base62, decreasing the space slightly once again. [x??? < 1]

    So all told that 22 byte token you're generating represents 22 * 8 * 0.5 * 0.75 * ??? <= 66 bits of random data. So <= 73,786,976,294,838,206,464 possibilities.

    Boy howdy, that sure seems like a lot, right? Well not really. Because of the Birthday Paradox the probability of collisions can get into the range of causing issues while you're still a few orders of magnitude away from filling the range.

    I guess if we remove that pointless bin2hex() we could squeeze out another 66 bits for 132 in total? But how much more does that really get us?

    5,444,517,870,735,015,415,413,993,718,908,291,383,296

    A lot. A lot. I don't even care about that preg_replace() anymore.

    For the sake of completeness, what about just a random_bytes(22)? 176 bits?

    95,780,971,304,118,053,647,396,689,196,894,323,976,171,195,136,475,136

    I guess the take-aways are:

    1. Don't confuse data encoding with "make more random" just because the output looks garbled. [Note: the same goes for hash functions]
    2. Don't apply functions/encodings willy-nilly if you don't know what they are actually doing.

    In code:

    $input = 'abc';
    
    // all of these outputs contain the SAME amount of entropy, some of them are just longer representations
    var_dump(
        $input,
        bin2hex($input),
        base64_encode($input),
        base64_encode(bin2hex($input)),
        bin2hex(base64_encode($input))
    );
    

    Output:

    string(3) "abc"
    string(6) "616263"
    string(4) "YWJj"
    string(8) "NjE2MjYz"
    string(8) "59574a6a"
    

    Anyway, with a sufficiently large random ID space it's more pragmatic to just put a UNIQUE constraint on the value and let the process fail when a duplicate value tries to be inserted. You can put in some retry logic, but odds are that it will never actually run unless someone leverages vulnerabilities specifically to make you generate duplicates and DoS yourself with retries. [yes, this is a thing]