c optimization theory integer-arithmetic

Trick to divide a constant (power of two) by an integer

NOTE This is a theoretical question. I'm happy with the performance of my actual code as it is. I'm just curious about whether there is an alternative.

Is there a trick to do an integer division of a constant value, which is itself an integer power of two, by an integer variable value, without having to use do an actual divide operation?

// The fixed value of the numerator
#define SIGNAL_PULSE_COUNT 0x4000UL

// The division that could use a neat trick.
uint32_t signalToReferenceRatio(uint32_t referenceCount)
{
    // Promote the numerator to a 64 bit value, shift it left by 32 so
    // the result has an adequate number of bits of precision, and divide
    // by the numerator.
    return (uint32_t)((((uint64_t)SIGNAL_PULSE_COUNT) << 32) / referenceCount);
}

I've found several (lots) of references for tricks to do division by a constant, both integer and floating point. For example, the question What's the fastest way to divide an integer by 3? has a number of good answers including references to other academic and community materials.

Given that the numerator is constant, and it's an integer power of two, is there a neat trick that could be used in place of doing an actual 64 bit division; some kind of bit-wise operation (shifts, AND, XOR, that kind of stuff) or similar?

I don't want any loss of precision (beyond a possible half bit due to integer rounding) greater than that of doing the actual division, as the precision of the instrument relies on the precision of this measurement.

_{"Let the compiler decide" is not an answer, because I want to know if there is a trick.}

Extra, Contextual Information

I'm developing a driver on a 16 bit data, 24 bit instruction word micro-controller. The driver does some magic with the peripheral modules to obtain a pulse count of a reference frequency for a fixed number of pulses of a signal frequency. The required result is a ratio of the signal pulses to the reference pulse, expressed as an unsigned 32 bit value. The arithmetic for the function is defined by the manufacturer of the device for which I'm developing the driver, and the result is processed further to obtain a floating point real-world value, but that's outside the scope of this question.

The micro-controller I'm using has a Digital Signal Processor that has a number of division operations that I could use, and I'm not afraid to do so if necessary. There would be some minor challenges to overcome with this approach, beyond the putting together the assembly instructions to make it work, such as the DSP being used to do a PID function in a BLDC driver ISR, but nothing I can't manage.

Solution

You cannot use clever mathematical tricks to not do a division, but you can of course still use programming tricks if you know the range of your reference count:

Nothing beats a pre-computed lookup table in terms of speed.
There are fast approximate square root algorithms (probably already in your DSP), and you can improve the approximation by one or two Newton-Raphson iterations. If doing the computation with floating-point numbers is accurate enough for you, you can probably beat a 64bit integer division in terms of speed (but not in clarity of code).

You mentioned that the result will be converted to floating-point later, it might be beneficial to not compute the integer division at all, but use your floating point hardware.