Search code examples
cassemblyarmcortex-mthumb

unshifted register required - Assembler throws error on the TST instruction


I am currently rewriting an algorithm from C to arm assembly (ARM Cortex M4 CPU).

What does my code do?

This algorithm takes an 8-bit number as input and starting from the right tells us what is the first bit that’s 0. Here are a few examples:

Input: B01111111 Output:7

Input: B01110111 Output:3

Input: B11111110 Output:0

Here is the original C code that accomplished this:

uint8_t find_empty(uint32_t input_word)
{
  for (uint8_t searches=7; searches>=0; searches--)
  {
    if ((input_word&1)==0)
    {
      return 7-searches;
    }
    
    input_word=input_word>>1;
  }
  return 255;
}

And here is my beginner attempt at rewriting this in ARM (Cortex M4) assembly.

.global findEmpty
findEmpty:
    mov r1, r0 //Move input_word to r1
    
    //Config
    mov r0, #7 //search through 8 (7+1) bits. <-searches

    FindLoop:
      tst r1, #1 //ANDs input_word with 1, sets the Z flag accordingly.
      
      beq NotFoundYet //didn't get a 0, jump forward
        rsb r0, r0, #7 //searches=7-searches <- which bit is 0? 
        bx lr //Return found bit number
      
      NotFoundYet:
      lsr r1, r1, #1 //input_word=input_word>>1

      sub r0, r0, #1 //Decrement searches
      cmp r0, #0
      bpl FindLoop //If searches>=0, do the loop again. 
    mov r0, #255 //We didn't find anything. Return 255 to signal that
    bx lr

Quick note: I used r1 as a variable here, which I heard you are not supposed to do as the compiler (I am linking my assembly “.S” file to a C file with gcc) uses r0-r3 to pass data to and receive data from functions. However, because of that it doesn’t use these registers for important things, so I don’t have to deal with pushing stuff to the stack, which saves cycles.

What’s the problem?

When I try to compile my project gcc gives me an assembler error on the TST line:

Assembler messages: Error: unshifted register required -- `tst r1, #1’

This is very confusing to me, as I’ve looked at the keil site for both the TST instruction and the LSR instruction which I am using later to shift r1 by 1. Yet none of them say anything about not being able to work together. I’ve looked online for other discussions on this topic. I came across this discussion where people were saying to tell the compiler to compile in ARM mode, but my code already is running in ARM mode, not Thumb. I confirmed this by making another .global subroutine and trying to add an immediate over 7 to a number, and sure enough it doesn’t work, like it shouldn’t if the CPU is in ARM mode.

.global illegal_add
illegal_add:
    add r0, r0, #20
    bx lr

I know very little and am out of ideas how to try and tackle this issue. If anybody has any ideas with things to try, please let me know. Thank You for the help.


Solution

  • It is not 100% clear to me what the problem is. Most likely, you forgot to set up the assembly correctly. To fix this, issue these directives at the beginning of the file:

    .syntax unified
    .cpu cortex-m4
    .thumb
    

    If I place these in front of your code, it assembles just fine on my machine.

    A few general hints:

    • read up on which instructions are 16 bit encodable and try to pick instructions from these. 16 bit instructions execute faster and consume less memory. For example, you can use lsrs r1, r1, #1 instead of lsr r1, r1, #1 to get a 16 bit instruction.
    • be more clever with flag manipulations. Many instructions already set flags for you and if you are clever, you can likely avoid all tst and cmp instructions. For example, if you use subs r0, r0, #1 instead of sub r0, r0, #1 you save a byte (16 bit instruction) and already set the Z flag according to r0, saving you the subsequent cmp instruction.