Search code examples
cbit-manipulation

How to explain this program calculating 3/4*x using bit operation?


For a given value of int type, calculating 3/4*x using the most basic bit operations(i.e. no while/for or other C control logic. Also suppose sizeof(int) = 4 (Byte). And the hard point is to truncate towards 0 and avoiding overflow.

What I have tried:

Suppose expressing x as its binary form: x = [b_31 b_30 ... b_2 b_1 b_0]
Then 3/4*x = x/2 + x/4 = ([b_31 ... b_1 0] + b_0) >> 1 + ([b_31 ... b_2 0 0] + b_1 b_0) >> 2
Then using x>>1 + x>>2 we can get [b_31 ... b_1 0]>>1 + [b_31 ... b_2 0 0]>>2 without precision lost or overflow problem, leaving only [b_0 0] + [b_1 b_0]>> 2 to deal with. But I got stuck here for I had no idea how to implement truncating towards zero.

Comparing to the following sample program, I found two variables x_mask and bias used. I guess they are used to solve the truncate problem since for a negative number bias will always be 11 and for a positive number 00. Can anyone help explain the logic here?

int threefourths(int x) {
     int xl2 = x & 0x3;
     int xl1 = (x&1) << 1;
     int x_mask = x >> ((sizeof(int)<<3)-1);
     int bias = x_mask & 3;
     int incr = (xl2+xl1+bias) >> 2;
     int s2 = x >> 2;
     int s1 = x >> 1;
     return s1 + s2 + incr;
     }

Solution

  • We want to compute 0.75 * x, with the result rounded towards 0. Because 0.75 == 0.5 + 0.25, computing it as (x >> 1) + (x >> 2) is a good approximation, provided the right shifts are mapped to arithmetic right shift instructions, something ISO C does not guarantee. The standard specifies:

    The result of E1 >> E2 is E1 right-shifted E2 bit positions. [...] If E1 has a signed type and a negative value, the resulting value is implementation-defined.

    The approximation will frequently underestimate the desired result, because (1) the division of negative integers by arithmetic right shift rounds to negative infinity, and (2) truncation both of the two individual terms results in an underestimation compared to truncating once, as in the reference computation.

    So we know that a correction may need to be applied, and such a correction must be zero or positive. Because the division is by 4, there are four remainder cases to consider per sign-bit setting, giving a total of eight cases. In the posted code, the four remainder cases are extracted as x & 3 which is the same as x % 4. I am not able to follow the details of code further, but would like to show a alternative that is easier to understand.

    A simple enumeration of the eight possible cases shows that corrections are needed as shown in the table below. The value of the corrections is determined by subtracting the approximation (x >> 1) + (x >> 2) from the reference result, for representative values belonging to each of the eight classes, e.g. x in [0x7ffffffc, 0x80000003]. The corrections for negative arguments are larger than those for positive arguments because both of the underestimation effects enumerated above combine.

       s = sign bit, e.g. x<31>, x1 = bit x<1>, x0 = bit x<0>
    
       x%4 s x1 x0 | corr
       ------------+-----
        0  0  0  0 |  0
        1  0  0  1 |  0
        2  0  1  0 |  0
        3  0  1  1 |  1
        0  1  0  0 |  0
        1  1  0  1 |  1
        2  1  1  0 |  1
        3  1  1  1 |  2
    

    It is straightforward to see that the required correction equates to (x0 & x1) + (s & (x0 | x1)). We can slightly optimize this at the cost of clarity, as shown in the following implementation:

    int threefourths (int x) 
    {
        int s2 = x >> 2; // ensure this maps to arithmetic right shift instruction
        int s1 = x >> 1; // ensure this maps to arithmetic right shift instruction
        unsigned int ux = (unsigned int)x;
        unsigned int s = ux >> (sizeof(ux) * CHAR_BIT - 1);
    #if READABILITY
        unsigned int x0 = ux & 1;
        unsigned int x1 = s1 & 1;
        unsigned int corr = (x0 & x1) + (s & (x0 | x1));
    #else // SPEED
        unsigned int corr = ((ux & s1) & 1) + (s & (ux | s1));
    #endif // REDABILITY vs SPEED
        return (int)(s1 + s2 + corr);
    }
    
    int ref_threefourths (int x)
    {
        return (int)((double)x * 0.75);
    }