Search code examples
algorithmdata-conversionbase-conversion

Algorithm to convert from radix 256 to multi-radix and back


I have a data stream of bytes, also known as radix 256 symbols. What is the best algorithm to convert that, ideally on the fly, to a new stream of symbols where the radix of each symbol varies and is only known at runtime? The lengths of the input byte stream and the target radix list are both long but finite. All non-negative integers, no floating point. Additionally, the target radix cannot be guaranteed to evenly divide or be a multiple of 256.


Solution

  • Your problem is a subset of Arithmetic Encoding, which is used as the last stage in many compression algorithms. It's one of the coolest things to learn in CS:

    http://www.drdobbs.com/cpp/data-compression-with-arithmetic-encodin/240169251 https://en.wikipedia.org/wiki/Arithmetic_coding

    How your problem specifically relates:

    The encoder you want is an arithmetic decoder, and for each decoding you will use a different sized alphabet (the radix) with equal probabilities for all symbols.

    The main loop of your encoder will do something like this:

    int val=0; //information from the stream
    int range=1; //val is in [0,range)
    while(...)
    {
        int radix = next_radix();
        //ensure adequate efficiency
        while(range < radix*256)
        {
            val = (val<<8)|(next_byte()&255);
            range<<=8;
        }
        int output = (int)(radix*(long)val/range);
        //find the smallest possible val that produces this output
        int low = (int)((output*(long)range+radix-1)/radix);
        //find the smallest possible val that produces the next output
        int high = (int)(((output+1)*(long)range+radix-1)/radix);
        val-=low;
        range = high-low;
        write(output);
    } 
    

    There are complications with handling the termination conditions and handling carries in your decoder (the arithmetic encoder), so you'll have to read the literature, starting with the stuff I linked. I hope this gives you the flavour of how it works, though.

    Good luck