Search code examples
c#md5shortcharacter-encoding

MD5 with Full Charset, Make String Shorter, but how?


I know I can use MD5 easily in c#, the code I use is below:

using System.Security.Cryptography;
public static string MD5(string input)
{
    MD5CryptoServiceProvider x = new MD5CryptoServiceProvider();
    byte[] bs = System.Text.Encoding.UTF8.GetBytes(input);
    bs = x.ComputeHash(bs);
    System.Text.StringBuilder s = new System.Text.StringBuilder();
    foreach (byte b in bs)
    {
        s.Append(b.ToString("x2").ToLower());
    }
    return s.ToString();
}

Normally, MD5 string is 32 chars long with 0-9 and a-f. I was wondering, if I use 0-9 A-Z a-z and _ . - etc, which will make around 64 chars set, and the MD5 string will be significant shorter. I will use the MD5 string to identify something and save in database. A shorter string can do the same thing and take less space, take less time to index.

So here, my question, is here anyone has a fast method to change 0-9a-f string to 0-9A-Za-z._ string

p.s. I have a basic idea is change MD5 string to a int(will be very big), then transfer to a string with any charset. I can imaging it will slow.

Any idea helps. Thanks.

[EDIT]

Thank you everyone, eventually I roll out one of my own, I need a string, because I need it as part of URL as well. For the reason I don't use Base64 because I want a fixed length string.

MD5 is 16 chars and length is 32, when I change it to 64 chars, length become 20.

here is the updates function I use.

public static string MD520(string input) 
{
    char[] chars = new char[] { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '-', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z' };
    MD5CryptoServiceProvider x = new MD5CryptoServiceProvider();
    byte[] bs = System.Text.Encoding.UTF8.GetBytes(input);
    bs = x.ComputeHash(bs);
    System.Text.StringBuilder s = new System.Text.StringBuilder();
    byte m = 0;
    int c = 0;
    foreach (byte b in bs)
    {
        if (m == 2)
        {
            c = c * b;
            for (byte i = 0; i < 4; i++)
            {
                int n = c % 64;
                s.Append(chars[n]);
                c = (c - n) / 64;
            }
            m = 0;
        }
        else
        {
            c = ((m > 0) ? c : 1) * b;
            m++;
        }
    }
    return s.ToString();
}

Solution

  • Instead of

        System.Text.StringBuilder s = new System.Text.StringBuilder();
        foreach (byte b in bs)
        {
            s.Append(b.ToString("x2").ToLower());
        }
        return s.ToString();
    

    You could use

        return Convert.ToBase64String(bs);
    

    This will result in a shorter string but doesn't use the full charset. You'd have to create your own custom encoding for that, though I wouldn't doubt someone out there already has.