Search code examples
c#randombase36

Why isn't this Base36 random string using RandomNumberGenerator randomly distributing the characters


I am trying to generate a random Base36 string using C#. I am using RandomNumberGenerator rather than Random as the code needs to be threadsafe. I have the following code setup:

private readonly RandomNumberGenerator _random = RandomNumberGenerator.Create(); 

private string GenerateBase36Token(int length)
{
    string token = string.Empty;

    for (int i = 0; i < length; i++)
    {
        byte[] bytes = new byte[100];
        _random.GetBytes(bytes);
        token += ToBase36String(bytes)[0];
    }

    return token;
}

private string ToBase36String(byte[] toConvert)
{
    const string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    BigInteger dividend = new BigInteger(toConvert);
    StringBuilder builder = new StringBuilder();

    while (dividend != 0)
    {
        BigInteger remainder;
        dividend = BigInteger.DivRem(dividend, alphabet.Length, out remainder);
        builder.Insert(0, alphabet[Math.Abs((int)remainder)]);
    }

    return builder.ToString();
}

This appears to work however looking at the results it is apparent that the strings are not evenly distributing the potential characters. There are a lot of repeating letters and numbers very rarely appear.

Is the problem just taking the first characters of the random string or is there an issue with how the string is being built?


Solution

  • I think you should use modulus instead of DivRem if you want to stick to this approach. My motivation for this is, that if you keep dividing a large number by a smaller number, you get a situation where it doesn't matter whether the original number was relatively higher or lower (i.e. an amount of 100, relative to a big number).

    For example, take these numbers as input (just as an example): 36.000.000 as your dividend, and 10 as your divisor. The while loop in your ToBase36String will go as follows:

    iteration 1: dividend: 36.000.000 remainder: 3.600.000

    iteration 2: dividend: 3.600.000 remainder: 360.000

    iteration 3: dividend: 360.000 remainder: 36.000

    iteration 4: dividend: 36.000 remainder: 3.600

    iteration 5: dividend: 3.600 remainder: 360

    iteration 6: dividend: 360 remainder: 36

    iteration 7: dividend: 36 remainder: 3

    If we would have started with i.e. 38.000.000 or 31.000.000 as the dividend, it wouldn't have mattered, because iteration 7 would have gotten a remainder of 3 anyways due to how integer division works.

    The point that I'm trying to make, is that it seems unnecessary to me, to randomly generate a number larger than 36 for each base36 character, and your GenerateBase36Token method creates 100 bytes of data for each character.

    Also, I'm wondering as to why you would want a base36 character, while base64 is a widely used and accepted format for encoding of data.

    tl;dr: A quick and easy solution could be to just generate a single byte of random data, and use the modulus operator instead of the DivRem method.

    EDIT: Updated your code

    private readonly RandomNumberGenerator _random = RandomNumberGenerator.Create(); 
    
    private string GenerateBase36Token(int length)
    {
        string token = string.Empty;
    
        for (int i = 0; i < length; i++)
        {
            byte[] bytes = new byte[1]; //edited byte array size
            _random.GetBytes(bytes);
            token += ToBase36String(bytes)[0];
        }
    
        return token;
    }
    
    private string ToBase36String(byte[] toConvert)
    {
        const string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
        int dividend = (int)toConvert[0];
        StringBuilder builder = new StringBuilder();
    
        int remainder;
        remainder = dividend % alphabet.Length; //edited DivRem method usage to modulo operator usage
        builder.Insert(0, alphabet[remainder]);
    
        return builder.ToString();
    }