i have got the following c-function
int main_compare (int nbytes, char *pmem1, char *pmem2){
for(nbytes--; nbytes>=0; nbytes--) {
if(*(pmem1+nbytes) - *(pmem2+nbytes) != 0) {
return 0;
}
}
return 1;
}
and i want to convert it into an ARM - Cortex M3 - assembler code. I'm not really good at this, and i don't have a suitable compiler to test if i do it right. But here comes what i have so far
byte_cmp_loop PROC
; assuming: r0 = nbytes, r1=pmem1, r2 = pmem2
SUB R0, R0, #1 ; nBytes - 1 as maximal value for loop counter
_for_loop:
ADD R3, R1, R0 ;
ADD R4, R2, R0 ; calculate pmem + n
LDRB R3, [R3] ;
LDRB R4, [R4] ; look at this address
CMP R3, R4 ; if cmp = 0, then jump over return
BE _next ; if statement by "branch"-cmd
MOV R0, #0 ; return value is zero
BX LR ; always return 0 here
_next:
sub R0, R0, #1 ; loop counting
BLPL _for_loop ; pl = if positive or zero
MOV R0, #1 ;
BX LR ; always return 1 here
ENDP
but i'm really not sure, if this is right, but i have no idea how to check it....
I see just 3 fairly simple problems there:
BE _next ; if statement by "branch"-cmd
...
sub R0, R0, #1 ; loop counting
BLPL _for_loop ; pl = if positive or zero
BEQ
, not BE
- condition codes are always 2 letters.SUB
alone won't update the flags - you need the suffix to say so i.e. SUBS
.BLPL
would branch and link, thus overwriting your return address - you want BPL
. Actually, BLPL
wouldn't assemble here anyway, since in Thumb a conditional BL
would need an IT
to set it up (unless of course your assembler is clever enough to insert one automatically).Edit: there's also of course a more general issue with the use of R4
in both the original code and my examples below - if you're interfacing with C code the original value must be preserved across the function call and restored afterwards (R0
-R3
are designated argument/scratch registers and can be freely modified). If you're in pure assembly however you don't necessarily need to follow a standard calling convention so can be more flexible.
Now, that's a very literal representation of the C code, and doesn't make best use of the instruction set - in particular the indexed addressing modes. One of the attractions of assembly programming is having complete control of the instructions, so how can we make it worth our while?
First, let's make the C code look a little more like the assembly we want:
int main_compare (int nbytes, char *pmem1, char *pmem2){
while(nbytes-- > 0) {
if(*pmem1++ != *pmem2++) {
return 0;
}
}
return 1;
}
Now that that shows our intent more clearly, let's play compiler:
byte_cmp_loop PROC
; assuming: r0 = nbytes, r1=pmem1, r2 = pmem2
_loop:
SUBS R0, R0, #1 ; Decrement nbytes and set flags based on the result
BMI _finished ; If nbytes is now negative, it was 0, so we're done
LDRB R3, [R1], #1 ; Load from the address in R1, then add 1 to R1
LDRB R4, [R2], #1 ; ditto for R2
CMP R3, R4 ; If they match...
BEQ _loop ; then continue round the loop
MOV R0, #0 ; else give up and return zero
BX LR
_finished:
MOV R0, #1 ; Success!
BX LR
ENDP
And that's nearly 25% fewer instructions! Now if we pull in another instruction set feature - conditional execution - and relax the requirements slightly, without breaking C semantics, it gets smaller still:
byte_cmp_loop PROC
; assuming: r0 = nbytes, r1=pmem1, r2 = pmem2
_loop:
SUBS R0, R0, #1 ; In C zero is false and any nonzero value is true, so
; when R0 becomes -1 to trigger this branch, we can just
; return that to indicate success
IT MI ; Make the following instruction conditional on 'minus'
BXMI LR
LDRB R3, [R1], #1
LDRB R4, [R2], #1
CMP R3, R4
BEQ _loop
MOVS R0, #0 ; Using MOVS rather than MOV to get a 16-bit encoding,
; since updating the flags won't matter at this point
BX LR
ENDP
assembling to a meagre 22 bytes, that's nearly 40% less code than we started with :D