Search code examples
algorithmavrmultiplication8-bitattiny

tinyAVR: best known multiplication routines for 8-bit and 16-bit factors?


"Faster than avr200b.asm"? The mpy8u-routine from avr200b.asm for those processors of Atmel's AVR family that do not implement any of the MUL instructions seems pretty generic, but mpy16u looks sloppy for rotating both lower result bytes 16 times instead of 8. Antonio presented a fast 16×16→16 unsigned multiplication using 64 cycles worst case excluding call/return overhead.
I arbitrarily suggest as optimisation goals worst case cycle count, word count (RAM and flash), register usage, and expected cycle count in order of decreasing priority.
(There are reduced core AVRs ("single digit"-ATtiny, 10/20/40) with differences including timing, which I suggested to ignore.)

(Caution: Don't take any claim herein for granted, at least not without independent affirmation.)

What are best currently known 8×8→8/16, 16×16→16/32 and 16×8→16/24 bit multiplication routines for AVRs without MUL?


Solution

  • A list of pertaining algorithms and implementations for signed and unsigned 8×8→8/16, 16×16→16/32 and 8×16→16/24 bits as a starting point: