I am currently rewriting an algorithm from C to arm assembly (ARM Cortex M4 CPU).
What does my code do?
This algorithm takes an 8-bit number as input and starting from the right tells us what is the first bit that’s 0. Here are a few examples:
Input: B01111111 Output:7
Input: B01110111 Output:3
Input: B11111110 Output:0
Here is the original C code that accomplished this:
uint8_t find_empty(uint32_t input_word)
{
for (uint8_t searches=7; searches>=0; searches--)
{
if ((input_word&1)==0)
{
return 7-searches;
}
input_word=input_word>>1;
}
return 255;
}
And here is my beginner attempt at rewriting this in ARM (Cortex M4) assembly.
.global findEmpty
findEmpty:
mov r1, r0 //Move input_word to r1
//Config
mov r0, #7 //search through 8 (7+1) bits. <-searches
FindLoop:
tst r1, #1 //ANDs input_word with 1, sets the Z flag accordingly.
beq NotFoundYet //didn't get a 0, jump forward
rsb r0, r0, #7 //searches=7-searches <- which bit is 0?
bx lr //Return found bit number
NotFoundYet:
lsr r1, r1, #1 //input_word=input_word>>1
sub r0, r0, #1 //Decrement searches
cmp r0, #0
bpl FindLoop //If searches>=0, do the loop again.
mov r0, #255 //We didn't find anything. Return 255 to signal that
bx lr
Quick note: I used r1 as a variable here, which I heard you are not supposed to do as the compiler (I am linking my assembly “.S” file to a C file with gcc) uses r0-r3 to pass data to and receive data from functions. However, because of that it doesn’t use these registers for important things, so I don’t have to deal with pushing stuff to the stack, which saves cycles.
What’s the problem?
When I try to compile my project gcc gives me an assembler error on the TST line:
Assembler messages: Error: unshifted register required -- `tst r1, #1’
This is very confusing to me, as I’ve looked at the keil site for both the TST instruction and the LSR instruction which I am using later to shift r1 by 1. Yet none of them say anything about not being able to work together. I’ve looked online for other discussions on this topic. I came across this discussion where people were saying to tell the compiler to compile in ARM mode, but my code already is running in ARM mode, not Thumb. I confirmed this by making another .global subroutine and trying to add an immediate over 7 to a number, and sure enough it doesn’t work, like it shouldn’t if the CPU is in ARM mode.
.global illegal_add
illegal_add:
add r0, r0, #20
bx lr
I know very little and am out of ideas how to try and tackle this issue. If anybody has any ideas with things to try, please let me know. Thank You for the help.
It is not 100% clear to me what the problem is. Most likely, you forgot to set up the assembly correctly. To fix this, issue these directives at the beginning of the file:
.syntax unified
.cpu cortex-m4
.thumb
If I place these in front of your code, it assembles just fine on my machine.
A few general hints:
lsrs r1, r1, #1
instead of lsr r1, r1, #1
to get a 16 bit instruction.tst
and cmp
instructions. For example, if you use subs r0, r0, #1
instead of sub r0, r0, #1
you save a byte (16 bit instruction) and already set the Z flag according to r0
, saving you the subsequent cmp
instruction.