C atmega2560 Division of large integers

So I'm wondering about the costs of division on a atmega2560 as well as in general: Let's say I got something like this

    unsigned long long a=some-large-value;
    unsigned long long b=some-other-large-value;
    unsigned long result=(a-b)/A_CONSTANT
    //A_CONSTANT i.e. 16

How long does it actually take? Are we speaking about hundrets or thousands of cycles? And does it make a difference if I change the division to a multiplication i.e. like so

    unsigned long result=(a-b)*1/A_CONSTANT

I want to use that in a time-critical application for calculating a time span which is used for determining when to execute another part of the program. Assuming the division takes too much time, what other options do I have?

Solution

This really depends on your A_CONSTANT and how good the compiler is IMO.

I've looked up the chip and it's obviously an 8 bit processor with 8 or 16 MHz.

As such, I'd consider those unsigned long long integer to be the biggest hurdle to take, if your division is trivial.

For this it would have to be a power of two (like 2, 4, 8, 16, etc.). What would happen then, would be an optimization, replacing the whole division with a simple right shift, which would be completed in far less cycles.

Switching to a multiplication won't net you anything good. You'll at least suffer precision issues and your current code would result in the result 0 all the time, unless A_CONSTANT is 1 (since you're obviously doing an integer division, where the result is rounded down).

So what to do or whether to consider this something for optimization heavily depends on the actual value of A_CONSTANT.

Probably the easiest way solving this (or comparing solutions) would be comparing the resulting assembly code, because it will be the final result that's actually processed. Optimizing this purely on theory is rather complicated and might even get you wrong or misleading results.