Search code examples
optimizationstoragebitradix

Store non-binary values into a unique integer


In 8 bits we can store 8 numbers from 0 to 1 each. We can also say that we can store 8 different piece of data in a range from 0 to 255.

 0/1   0/1   0/1   0/1   0/1   0/1   0/1   0/1  → 8 different piece of data
  ↓     ↓     ↓     ↓     ↓     ↓     ↓     ↓
[bit] [bit] [bit] [bit] [bit] [bit] [bit] [bit] → 8 bits total

If instead of storing 0 or 1 we need to store 0, 1 or 2, then we will end up using more bits, of course. In this condition, by the way, we can store 0, 1, 2 or 3. However, this is beyond what I need but technically it will use the same amount of data space.

 0/1/2   0/1/2   0/1/2   0/1/2  → 4 different piece of data (expectative)
0/1/2/3 0/1/2/3 0/1/2/3 0/1/2/3 → 4 different piece of data (too much for me)
   ↓       ↓       ↓       ↓
[2bits] [2bits] [2bits] [2bits] → 8 bits total

We can agree that we are "losing" storage capacity. The solution is to do a radix conversion (I don't know if that's the right term) using the numerical base of 3, this way we will achieve the objective of storing exclusively 0, 1 or 2, and in the same 8 bits we can store more information (and, to be honest, I don't know how the bits organize themselves for this to work).

Examples:

00000₃ → int 0
01212₃ → int 50
11111₃ → int 121
12121₃ → int 151
22222₃ → int 242 (max)

In this conversion, we were able to store 5 pieces of information composed of 0, 1 or 2, instead of just 4. And there's still a "space" left, and that's what I'd like to talk about.

I was wondering if there is any way to store mixed base data instead of fixed base (eg. 3, as exemplified above). To be clearer, supposing that I wanted to store two pieces of data composed of 0, 1 or 2 each, and one piece of data composed of a number between 0 and 8. In binary, and still limited to 256, we can do it as follows:

 0/1/2   0/1/2    0/1/2/3/4/5/6/7/8  → 3 piece of different data
   ↓       ↓              ↓
[2bits] [2bits] [      4 bits      ] → 8 bits total

But, it seems to me that we have empty "spaces" left again, since it doesn't seem to me to be using the full capacity possible in 8 bits, and maybe it's still possible to add more data if we use number base conversion instead.

The problem is: I don't know how to handle the data in this way, in a way to reliably merge and split it again.

In the real world: I need to transfer a lot of data that uses very little information (eg numbers 0 to 2, 0 to 5, 0 to 10, etc.). And I'm currently doing bitwise snapping of this data. And in my case, any optimized byte is a very nice gain (if this is really possible, maybe there is a 20~40% data saving).

I understand that it might consume more processing on rebasing, merge and split conversions, but that won't be an issue, because this processing will be done on the client side (merge when sending and split when receiving), and the server manages the same data already optimized (no additional processing).

--

Possible solutions:

Let's imagine that I have a set of three numbers that can be 0, 1 or 2, and another set of three numbers that can go from 0 to 8 (so it is 9 possibilities each). The goal is to merge all these numbers into a single integer (which should use about 2 bytes at best current method).

Solution 1: each number being stored in a single complete byte:

byte A from 0 to 2
byte B from 0 to 2
byte C from 0 to 2
byte D from 0 to 8
byte E from 0 to 8
byte F from 0 to 8

Problem: it will be necessary to consume 6 bytes to store these 6 numbers.

Solution 2: storage through bits:

2 bits to store A
2 bits to store B
2 bits to store C
4 bits to store F
4 bits to store G
4 bits to store H
6 unused bits to complete the last byte

Problems: in addition to 2 bits being "more than necessary" to store numbers from 0 to 2 (A, B and C), and same for 4 bits from 0 to 8, we will use a total of 3 bytes and still have 6 "dead" (unused) bits at end to complete the last byte.

Solution 3: Separate into two sets and convert their respective bases individually:

A, B and C to base 3 consumes 1 byte
D, E and F to base 9 consumes 2 bytes

Problem: despite being a great solution at the moment (and despite the example above, in more complex situations it can be more optimized than solution 2), and that's what I'm using, I believe there's a lot of "left over" space in this union, and maybe it's still possible to squeeze everything into 2 bytes only.

For example, the conversion of 222₃ consumes up to 5 bits, which means that in this byte we still have 3 unused bits. The 888₉ conversion consumes 10 bits. This makes me see the possibility of using only 15 bits with only 1 unused bit (2 bytes).

Then we come to the next solution.

Solution 4: move the bits to further optimize space:

higher bits stores A, B and C
lower bits stores D, E and F

Example:

[5 bits of numbers of base 3] +
[10 bits of numbers of base 9] +
[1 unused bit]

Problem: currently this solution is even better than the one I am currently using. However, I still see a possibility to improve in situations where more information may be available through number base conversion and union.


Solution

  • You pack your values in the following way:

    Example:

    possible values for

    a = [0,2] -> 3 states
    b = [0,2] -> 3 states
    c = [0,8] -> 9 states
    

    lets assume you have following values

    a := 1
    b := 2
    c := 8
    

    than you can calculate the final number the following way

    int number = 0; 
    int multi = 1; 
    number = number.add(multi.multiply(a));
    multi = multi.multiply(3)); 
    number = number.add(multi.multiply(b));
    multi = multi.multiply(3));
    number = number.add(multi.multiply(c));
    

    number holds now your packed numbers

    to unpack you just need todo

    a = number.mod(3)
    number = number.divide(3)
    
    b = number.mod(3)
    number = number.divide(3)
    
    c = number.mod(9)