Search code examples
arrayscfloating-pointprobabilitydiscrete-space

Mutate an array of discrete probabilities by excluding one value in C


I am working in a project in C where I want to progressively mutate a uint32_t under the following conditions:

  1. The probability of a bit flip starts out with probability 1/2 for the least significant bit (LSB), then 1/4 for the next bit to the left, 1/8 for the next, and so on (see example array).
  2. After a bit k is flipped the value of probability(k) is redistributed to all other bits according to the distribution laid out in step one.
  3. probability(k) is then set to zero.

I imagine that these probabilities are best stored in a length 32 array of doubles and so a very useful answer would be a function which accepts a length 32 array of doubles and some integer for a bit to be excluded and returns a modified length 32 array.

Is this accomplishable by generating a length 31 array excluding k with the procedure from step 1, multiplying each value by the value of array[k], then creating a length 32 array with array[k] = 0 and adding that to the input array (after setting input[k] = 0?

A problem I imagine might happen but which I'm unsure how to solve:

  • In step one, those probabilities are all 1.) large enough to be represented at all by doubles and 2.) powers of 2 so they are exactly represented. However, there is no good reason why they would remain so. The example array below sums to one because they are all exactly representable. Again, I have no reason to assume that will be true for other values. How to preserve the rough pragmatic ability to choose in a way that is equivalent to drawing from a distribution that does sum to one is unclear to me.

Answers

The solution has to be in C because the rest of the code in the project is. Sorry, I'm sure there are very cool ways to solve this in other languages. Probably the binomial package in R will have something that does this, but that doesn't help. A C-like language which I can manually adapt code to work in C is also fine.

I'm on a desktop computer otherwise in control of the development environment, so any libraries which would make this easy are welcome. Thanks. Also I don't expect any performance constraints so code that is slow or needs to store tables and such is fine.

My example here uses doubles but that's not definitive. I'm coming here asking the question because I don't know how to do this. If you have an answer which works with integers entirely then I would love to see that.

example array

void create_array32(double array[32]) {
    int i;
    for (i = 0; i < 32; i++) {
        array[i] = pow(2, -(32 - i));
    }
}
// The output, if that is easier to work with
double example[32] = {
0.0000000002328306, 0.0000000004656613,
0.0000000009313226, 0.0000000018626451,
0.0000000037252903, 0.0000000074505806,
0.0000000149011612, 0.0000000298023224,
0.0000000596046448, 0.0000001192092896,
0.0000002384185791, 0.0000004768371582,
0.0000009536743164, 0.0000019073486328,
0.0000038146972656, 0.0000076293945312,
0.0000152587890625, 0.0000305175781250,
0.0000610351562500, 0.0001220703125000,
0.0002441406250000, 0.0004882812500000,
0.0009765625000000, 0.0019531250000000,
0.0039062500000000, 0.0078125000000000,
0.0156250000000000, 0.0312500000000000,
0.0625000000000000, 0.1250000000000000,
0.2500000000000000, 0.5000000000000000}

Solution

  • Instead of maintaining an array of probabilities, maintain a corresponding array of selection frequencies:

    uint32_t frequencies[32];
    
    for (int i = 0; i < 32; i++) {
        frequencies[i] = (uint32_t) 1 << (31 - i);
    }
    

    If you like, you could pre-compute these starting frequencies and put them in an initializer instead of computing them at runtime.

    Each time you want to make a selection,

    1. Compute an array of the cumulative sums of the frequencies:

      uint32_t cumulative[33] = {0};
      
      for (int i = 0; i < 32; i++) {
          cumulative[i + 1] = cumulative[i] + frequencies[i];
      }
      
    2. Generate a (uniformly distributed) random number x between 0 (inclusive) and cumulative[32] (exclusive).

    3. Find the value n such that cumulative[n] <= x && x < cumulative[n + 1]. This n is the selected bit number. You could use a binary search, but a linear search would be simpler, and for only 32 items, about as fast.

    To remove bit n from further consideration, just set its frequency to 0:

    frequencies[n] = 0;
    

    When you compute the new cumulative sums for the next selection, that will naturally both exclude n from consideration and, by computing a revised total, adjust the probabilities of all the remaining options.