Search code examples
c#.netstringcompressionuint64

Shortest way to represent UInt64 as a string


I get a possibly large number (UInt.MaxValue: 18446744073709551615) as a normal base10 number. This number would eventually become a filename: 12345678945768.txt

Since filenames on Windows aren't limited to just numerical digits, I would like to "compress" this in to a shorter string but need to make sure the strings can be mapped back to a number.

For smaller numbers: 0001365555, hexed is much shorter than anything else. Everything I've found so far states that Base64 would be shortest, but it isn't.

So far I've tried this:

//18446744073709551615 - 20
UInt64 i = UInt64.MaxValue; // 0001365555

//"//////////8=" - 12
string encoded = Convert.ToBase64String(BitConverter.GetBytes(i)); 

//"FFFFFFFFFFFFFFFF" - 16
string hexed = i.ToString("X"); 

//"MTg0NDY3NDQwNzM3MDk1NTE2MTU=" - 28
string utf = Convert.ToBase64String(System.Text.Encoding.ASCII.GetBytes(i.ToString())); 

Is there a better way to "compress" integer to convert similar to Hex but use 00-zz and not just 00-FF?

Thanks in advance!


Solution

  • Everything I've found so far states that Base64 would be shortest, but it isn't.

    You don't want to use Base64. Base64 encoded text can use the / character, which is disallowed in file names on Windows. You need to come up with something else.

    What else?

    Well, you could write your own base conversion, perhaps something like this:

    public static string Convert(ulong number)
    {
        var validCharacters = "qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM1234567890!@#$%^&()_-";
        char[] charArray = validCharacters.ToCharArray();
        var buffer = new StringBuilder();
        var quotient = number;
        ulong remainder;
        while (quotient != 0)
        {
            remainder = quotient % (ulong)charArray.LongLength;
            quotient = quotient / (ulong)charArray.LongLength;
            buffer.Insert(0, charArray[remainder].ToString());
        }
        return buffer.ToString();
    }
    

    This is a "base-73" result, The more characters in validCharacters, the smaller the output will be. Feel free to add more, so long as they are legal characters in your file system.