java random bit-manipulation computer-science

Choosing a sequence of bitwise operations

Would it be possible to choose a sequence of bitwise operations on randomly generated 64-bit numbers such that the bits in the end result have a given probability of being set? My goal is to write a function which takes a double in the range [0, 1) and generates a long with bits in that proportion by only using the bitwise operations OR and AND on randomly generated longs.

My thought process is that this reduces to minimizing the error of |p - (4^n - 1)/(4^m)| where p is the desired ratio, m represents the total operations and n represents the amount of them which were ORs. This is because the resulting ratio after AND'ing k longs together is 1/(4^k) while the ratio after OR'ing n longs together should be 1 - 1/4^n, so the ratio after doing both is (4^k-1)/(4^(k+n)). This is where I'm stuck though, since I don't know how to go about finding the best value of k or n.

Solution

If I've understood your problem statement correctly, this sounds a lot like Russian peasant multiplication. Consider that a randomly generated long will have each bit independent and each bit set with probability 50%. Suppose (for the sake of recursion) we have your desired function f(p); then

rand()       gives the same distribution as f(0.5)
f(p) & f(q)  gives the same distribution as f(p*q)
f(p) | f(q)  gives the same distribution as f(p + q - p*q)

So suppose we're looking for f(0.9). We know that

f(0.9) = f(0.5) | f(x) where 0.5 + x - 0.5*x = 0.9, i.e. 0.5x = 0.4

So now we're looking for f(0.8). We know that

f(0.8) = f(0.5) | f(x) where 0.5x = 0.3
f(0.6) = f(0.5) | f(x) where 0.5x = 0.1
f(0.2) = f(0.5) & f(x) where 0.5x = 0.2
f(0.4) = f(0.5) & f(x) where 0.5x = 0.4
f(0.8) = f(0.5) | f(x) where 0.5x = 0.3
[...]

This recursion will never actually finish; but at any point we like, we can bottom out and start re-winding the stack.

f(0.9) = R | R | (R & R & (R | R | (R & R & ...)))

For example, this C code gives a distribution pretty darn close to p=0.9:

#define R (rand() % 256)
unsigned get09() {
  unsigned x = R;  // 0.5
  x |= R; // 0.75      
  x |= R; // 0.875
  x &= R; // 0.4375     
  x &= R; // 0.21875
  x |= R; // 0.609375
  x |= R; // 0.8046875
  x |= R; // 0.90234375
  return x;
}

Now, could you get even closer by using productions other than directly with R — for example, could we use the fact that f(0.9) = f(0.684) | f(0.684), and then look for ways to produce f(0.684)? Sure.

Now, maybe this is the Baader-Meinhof effect in action, but this actually (serendipitously!) feels just like a directed hypergraph "tersest path" problem akin to https://mathoverflow.net/questions/466176/what-is-the-proper-name-for-this-tersest-path-problem-in-infinite-craft — see https://quuxplusone.github.io/blog/2024/03/03/infinite-craft-theory/ and Knuth Volume 2 §4.6.3 "Evaluation of Powers" for some theory. We're looking for the tersest path from $V_0={0.5}$ to p, using an operation $E$ which is actually two possible operations: & and |. So from {0.5} we can reach any of {0.5, 0.25, 0.75} in one step; we can reach any of {0.5, 0.25, 0.75, 0.125, 0.625, 0.375, 0.875, 0.1875, 0.8125} in two steps; and so on; and you just want to know how many steps it'll take before we've reached some number within a certain epsilon of p.

EDIT: It occurs to me that those targets are much better represented in base 2, not base 10. From {0.1} (base 2) we can reach any of {0.1, 0.01, 0.11} in one step; any of {0.1, 0.01, 0.11, 0.001, 0.101, 0.011, 0.111, 0.0011, 0.1101} in two steps; and so on. Of course it's going to be impossible to reach any p whose binary representation has an infinite number of 1-bits (such as p=0.9=0.1110011001...₂); but we can get arbitrarily close.

And it sure looks like the general solution is (just like Russian peasant multiplication) to write the target in binary; then turn the 1s into |s and the 0s into &s. For example, to hit p=0.7⏨, which is 0.101100111...₂, we'd write this, where reading upward from bottom to top the bitwise operations are |&||&&|||... Each bitwise operation with 0.5₁₀ adds one more leading bit to the result.

#define R (rand() % 256)
unsigned get07() {
  unsigned x = R | R;  // 0.11
  x |= R; // 0.111
  x &= R; // 0.0111    
  x &= R; // 0.00111
  x |= R; // 0.100111
  x |= R; // 0.1100111
  x &= R; // 0.01100111
  x |= R; // 0.101100111
  return x;
}

Notice that (just like with addition chains) the Russian peasant method doesn't invariably yield the tersest solution. For example, to make 0.5675₁₀ = 0.1001₂ (Godbolt):

unsigned russianPeasant() { return (R & R & R) | R; }
unsigned butILikeThis()   { return (R | R) & (R | R); }