Search code examples
c32-bitinteger-divisionwebgpuwgsl

Divide 8-byte stored in 2 uint32 by a uint32 it on a machine with 32-bit operation


I want to divide uint64 by a uint32 in WebGPU, which currently only handles uint32 operations. So I am trying to simulate this by storing my long into two uint32 buffers high and low. and now I want to divide them by a uint32 b

I saw this answer, but I am having a hard time replicating it. I think for it to work, I should do (A / Y)x3 + (((A % Y)x + B) / Y)x2 + (((((A % Y)x + B) % Y)x + C) / Y)x + ((((((A % Y)x + B) % Y)x + C) / Y)x + D) / Y where x is 216. With that, A is the first 16 bits of high, and B is the lowest 16 bits of high. Similarly, C is the first 16 bits of low and D is the last 16 bits of low .

Here's a C code of the algorithm:

void u64DivU32(uint32_t& low, uint32_t& high, uint32_t& Y, uint32_t& res_low, uint32_t& res_high) {
    uint32_t a, b, c, d;

    a = high >> 16;
    b = (high << 16) >> 16;
    c = (low >> 16);
    d = (low << 16) >> 16;


    // compute high
    uint32_t t0 = a; // A
    uint32_t t1 = (((t0 % Y) << 16) + b); // ((A % Y)x + B)
    // (A / Y)x3 + (((A % Y)x + B) / Y)x2 
    res_high = ((t0 / Y) << 16) + (t1/Y);

    // compute low
    uint32_t t2 = (((t1 % Y) << 16) + c); // (((((A % Y)x + B) % Y)x + C) / Y)x
    uint32_t t3 = (((t2 / Y) << 16) + d); // ((((((A % Y)x + B) % Y)x + C) / Y)x + D) / Y
    res_low = ((t2 / Y) << 16) + (t3/Y);
}

However, it did not work. After a few tests, res_high is working as intended, but res_low is always off. I am also not sure how to apply this correctly when b is larger than 65535


Solution

  • I see two issues. First, you reused b, but that must have just been when entering the question as you wouldn't get anywhere close.

    The 2nd error is you put in a / instead of % for t3 calculation

    void u64DivU32(uint32_t& low, uint32_t& high, uint32_t& divisor, uint32_t& res_low, uint32_t& res_high) {
        uint32_t a, b, c, d;
    
        a = high >> 16;
        b = (high << 16) >> 16;
        c = (low >> 16);
        d = (low << 16) >> 16;
    
    
        // compute high
        uint32_t t0 = a; // A
        uint32_t t1 = (((t0 % divisor) << 16) + b); // ((A % Y)x + B)
        // (A / Y)x3 + (((A % Y)x + B) / Y)x2 
        res_high = ((t0 / divisor) << 16) + (t1/divisor);
    
        // compute low
        uint32_t t2 = (((t1 % divisor) << 16) + c); 
        uint32_t t3 = (((t2 % divisor) << 16) + d); 
        res_low = ((t2 / divisor) << 16) + (t3/divisor);
    }