Search code examples
javascriptarraysradix

How to convert base13 string to base64


I have to make a URL shortener for query strings. Have spent few days trying to compress array data into base64 strings. Thinking that the best approach may be to interpret something like "[[1,2,9,3],[1,0,2],[39,4]]" as base13 with numbers 0-9 and [], symbols.

how the current algorithm works: convert the stringified arrays into an array of base13, where each element represents 1 unique character, convert this array to base10 number, convert this number to base 64 string.

but the problem is when converting the base13 array to base10 number, it makes large numbers like 5.304781188371057e+86 which cant be held in js.

I am open to alternative solutions of course, but please do not suggest something like creating a database of URLs as it won't work as I have up to 51!*51! unique URLs, better to just make a compact encodable and decodable query string and decode it as soon as the website is accessed.

//convert stringified array to array of base13(each element = each digit of base13 number)
function stringToArray(string)
{
    let charSet = "[],1234567890";
    let array = [];
    for(let i = 0; i < string.length; i++)
    {
        array.push(charSet.indexOf(string[i]));
    }
    return array;
}

//convert base13 array to one large decimal number
function arrayToDecimal(array, base)
{
    var decimal = 0;
    for(let i = 0; i < array.length; i++)
    {
        decimal += array[i] * Math.pow(base, i)
    }
    return decimal;
}

//convert decimal number back to array
function decimalToArray(decimal, base)
{
    var quotient = decimal;
    var remainder = [];
    while(quotient > base)
    {
        remainder.push(quotient % base)
        quotient = Math.floor(quotient / base);
    }
    remainder.push(quotient % base)
    return remainder;
}

const alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';

// binary to string lookup table
const b2s = alphabet.split('');

// string to binary lookup table
// 123 == 'z'.charCodeAt(0) + 1
const s2b = new Array(123);
for(let i = 0; i < alphabet.length; i++)
{
    s2b[alphabet.charCodeAt(i)] = i;
}

// number to base64
const ntob = (number) =>
{
    if(number < 0) return `-${ntob(-number)}`;

    let lo = number >>> 0;
    let hi = (number / 4294967296) >>> 0;

    let right = '';
    while(hi > 0)
    {
        right = b2s[0x3f & lo] + right;
        lo >>>= 6;
        lo |= (0x3f & hi) << 26;
        hi >>>= 6;
    }

    let left = '';
    do {
        left = b2s[0x3f & lo] + left;
        lo >>>= 6;
    } while(lo > 0);

    return left + right;
};

// base64 to number
const bton = (base64) =>
{
    let number = 0;
    const sign = base64.charAt(0) === '-' ? 1 : 0;

    for(let i = sign; i < base64.length; i++)
    {
        number = number * 64 + s2b[base64.charCodeAt(i)];
    }

    return sign ? -number : number;
};



console.log(decimalToArray(bton(ntob(arrayToDecimal([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 13))), 13)) 
//encoded and decoded, works output:[1,1,1,1,1,1,1,1,1,1,1,1,1]
console.log(arrayToDecimal([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 13)) 
//encoding doesnt work, array to decimal converts to 5.304781188371057e+86```

Solution

  • An interesting problem... The first thing you will need to assess is whether the base conversion compression you're seeking is worthwhile. Ie, how many base 64 characters are required to represent n characters of base 13? This involves solving...

    13 ** n = 64 ** x
    

    Solving for x, we get...

     x = n * log(13) / log(64)
    

    Ie, for every n digits of base 13, how many digits of base 64 are required. A sampling of a few values of n returns...

    • n = 6, x = 3.70
    • n = 7, x = 4.31
    • n = 8, x = 4.93
    • n = 9, x = 5.55
    • n = 10, x = 6.17
    • n = 11, x = 6.78
    • n = 12, x = 7.40
    • n = 13, x = 8.01
    • n = 14, x = 8.63
    • n = 15, x = 9.25
    • n = 16, x = 9.86

    So how to interpret this? If you have 10 digits of base 13, you're going to need 7 digits (6.17 rounded up) of base 64. So the best ratio is when x is equal to, or just under, a whole number. So, 8 digits of base 13 requires 5 digits of base 64, achieving a best case of 5/8 or 62.5% compression ratio.

    Assuming that's good enough to meet your requirement, then the following function converts the "base13" string to base 64.

    const base13Chars = "0123456789[],";
    const base64Chars = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_';  
    // see https://en.wikipedia.org/wiki/Query_string for URL parameter allowable characters.
    
    function base13toBase64(x13) {
    
        base13 = x13.split("").map( c => base13Chars.indexOf(c) );
    
        // Make the array an even multiple of 8
        for (i = base13.length; i % 8 !==0; i++) {
            base13[i] = 0;
        }
    
        x64 = "";
        for (i = 0; i < base13.length; i += 8) {
            // Calculate base13 value of the next 8 characters.
            let n = 0;
            for (j = 0; j < 8; j++) {
                n = n * 13 + base13[i + j];
            }
            // Now calculate the base64 of n.
            for (j = 0; j < 5; j++) {
                x64 = x64 + base64Chars.substr(n % 64,1);
                n = Math.floor(n / 64);
            }   
        }
    
        return x64;
    }
    

    Running the above...

     base13toBase64( "[[1,2,9,3],[1,0,2],[39,4]]" ) returns "ilYKerYlgEJ4PxAAjaJi"
    

    Note that the original value is a length of 26 characters, and the base64 value is 20 characters, so the compression ratio is 77%, not quite the optimal 62.5%. This is because of the padding to bring the original array to 32 characters, an even multiple of 8. The longer the string to encode, though, the closer the ratio will be to 62.5%.

    Then, on the server side you'll need the constants above plus the following function to "uncompress" the base64 to the base13 stringified URL...

    function base64toBase13(x64) {
    
        base64 = x64.split("").map( c => base64Chars.indexOf(c) );
    
        x13 = "";
        for (i = 0; i < base64.length; i += 5) {
            // Calculate base64 value of the next 5 characters.
            let n = 0;
            for (j = 5 - 1; 0 <= j; j--) {
                n = n * 64 + base64[i + j];
            }
            // Now calculate the base13 of n.
            let x = "";
            for (j = 0; j < 8; j++) {
                x = base13Chars.substr(n % 13,1) + x;
                n = Math.floor(n / 13);
            }
            x13 = x13 + x;
        }
    
        // Removed the trailing 0's as a result of the buffering in
        // base13toBase64 to make the array an even multiple of 8.
        while (x13.substr(-1,1) === "0") {
            x13 = x13.substr(0, x13.length - 1);
        }
    
        return x13;
    }
    

    Running the above...

     base64toBase13 ( "ilYKerYlgEJ4PxAAjaJi" ) returns "[[1,2,9,3],[1,0,2],[39,4]]"
    

    Hope this helps...