I'm writing an assembly program for the Motorola 68k processor using Easy68K to compute the factorial of an integer $N$.
My current implementation:
ORG $8000
num DC.W 4 ; e.g. I want to calculate the factorial of 4
result DC.L 0 ; Space to store the result
START:
move.w num,d0 ; d0 = 4
ble.s ZorN
move.w d0,d1 ;also d1 = 4
subq.w #1,d0 ; d0 = d0 - 1 = 4 - 1 = 3
LOOP:
mulu.w d0,d1 ; 1st cycle: d1 = d1 * d0 = 4 * 3 = 12 [d1 is now 12].
; 2nd cycle: d1 = d1 * d0 = 12 * 2 = 24 [d1 is now 24].
; 3rd cycle: d1 = d1 * d0 = 24 * 1 = 24 [d1 is now 24].
subq.w #1,d0 ; 1st cycle: d0 = d0 - 1 = 3 - 1 = 2 [d0 is now 2].
; 2nd cycle: d0 = d0 - 1 = 2 - 1 = 1 [d0 is now 1].
; 3rd cycle: d0 = d0 - 1 = 1 - 1 = 0 [d0 is now 0].
bne.s LOOP ; Repeat the loop if d0 is not zero
move.l d1,result
bra.s END
ZorN:
blt.s END
moveq.l #1,d1 ;0! = 1
move.l d1,result
END:
SIMHALT
END START
beq.s
, blt.s
, bra.s
), and maybe there's a cleaner way to structure the loop.Any suggestions for improving efficiency and readability would be greatly appreciated :-)!
First you start with an elimination process for the negative and very small numbers.
Then follow that by a multiplications loop that ends as soon as D0 becomes less or equal to 2. In case the inputted number itself were 2, then this loop will do a redundant multiplication by 1. I consider that an acceptable loss. You would hardly notice it.
ORG $8000
num DC.W 4
result DC.L 0
START:
clr.l d1
move.w num, d0
blt.s Done ; D1 = 0 for negative numbers
moveq.l #1, d1
cmpi.w #2, d0
blt.s Done ; D1 = 1 for 0! and 1!
move.w d0, d1 ; The input number is 2 or more
LOOP:
subq.w #1, d0 ; 4 - 1 = 3, 3 - 1 = 2
mulu.w d0, d1 ; 3 * 4 = 12, 2 * 12 = 24
cmpi.w #2, d0
bgt.s LOOP
Done:
move.l d1, result
SIMHALT
END START
The loop now will do:
3 iterations for input 5
2 iterations for input 4
1 iteration for input 3
1 iteration for input 2
0 iterations for input 1
0 iterations for input 0
result DC.L 0
About result being a longword. Although the mulu.w
instruction produces a full longword, all next iterations in the loop will only be using the low word from that product. However, the final multiplication can leave the result above 65535, and therefore having DC.L
is a good choice.