Why can't I move #1001 into r5 on arm?

I have an RPi4 and I'm trying to write some code in assembly to loop 1000 times. The code works fine when I set a lower number of loops, but when I try to set it 1001, gcc says:

loop.s: Assembler messages:
loop.s:15: Error: invalid constant (3e9) after fixup

Here's the code:

.data
ms3: .asciz "%d\n"
.text
.global main
.extern printf
main:
    push {ip, lr}
    mov r1, #0
    mov r5, #1001

loop1000:
    push {r1}
    ldr r0, =ms3
    bl printf
    pop {r1}
    add r1, #1
    cmp r1, r5
    bne loop1000
    pop {ip, pc}

Solution

Assembly languages are defined by the tool not the target, so solutions and exact syntax for instructions varies. You mentioned gcc which implies gnu assembler although gcc being fed assembly language is yet another gnu arm assembly language

With gnu assembler the ldr = pseudo instruction will attempt to use the optimal instruction if it can otherwise it will do a pc-relative load. If you want full control then only use the ldr = thing for labels (clearly its original intent)

.cpu arm7tdmi
ldr r5,=1001
ldr r5,=0x00990000
ldr r5,=0x00990099
ldr r5,=0x90000009


.thumb
.cpu cortex-m0
ldr r5,=1001

.cpu cortex-m3
ldr r5,=1001
movw r5,#1001
ldr r5,=0x00990099
.align

Disassembly of section .text:

00000000 <.text>:
   0:   e59f5018    ldr r5, [pc, #24]   ; 20 <.text+0x20>
   4:   e3a05899    mov r5, #10027008   ; 0x990000
   8:   e59f5014    ldr r5, [pc, #20]   ; 24 <.text+0x24>
   c:   e3a05299    mov r5, #-1879048183    ; 0x90000009
  10:   4d03        ldr r5, [pc, #12]   ; (20 <.text+0x20>)
  12:   f240 35e9   movw    r5, #1001   ; 0x3e9
  16:   f240 35e9   movw    r5, #1001   ; 0x3e9
  1a:   f04f 1599   mov.w   r5, #10027161   ; 0x990099
  1e:   bf00        nop
  20:   000003e9    andeq   r0, r0, r9, ror #7
  24:   00990099    umullseq    r0, r9, r9, r0

starting in the middle with your question.

  10:   4d03        ldr r5, [pc, #12]   ; (20 <.text+0x20>)

1001 (0x3e9) does not fit within the 8 bit immediate,no rotation, of the mov immediate thumb instruction. so using ldr = the assembler created a pc-relative load, which has pros and cons.

There is a thumb2 extension only available on some processors that does support larger immediates

  12:   f240 35e9   movw    r5, #1001   ; 0x3e9

It even can do weird things like this.

  1a:   f04f 1599   mov.w   r5, #10027161   ; 0x990099

both the ldr = and directly using movw resulted in the same instruction (as expected).

  12:   f240 35e9   movw    r5, #1001   ; 0x3e9
  16:   f240 35e9   movw    r5, #1001   ; 0x3e9

There was some confusion in the comments (everyone needs to go read the documentation not just the OP)

   0:   e59f5018    ldr r5, [pc, #24]   ; 20 <.text+0x20>
   4:   e3a05899    mov r5, #10027008   ; 0x990000
   8:   e59f5014    ldr r5, [pc, #20]   ; 24 <.text+0x24>
   c:   e3a05299    mov r5, #-1879048183    ; 0x90000009

arm mode cannot do the 0x00990099 thing, but it can do 8 non-zero bits rotated on an even boundary 0x00990000 and 0x90000009, but not 0x000001FE, 0x102, and so on.

arm uses 32 bit instructions and like mips and others is limited in how many bits of immediate are possible while leaving room for the opcode for lack of a better term. thumb is 16 bit so much less room is available for an immediate. thumb2 extensions add additional instructions that take 2x16 bits but couldn't use arm encoding in general but for some reason didn't use the same immediate scheme that you see in arm instructions, so you have this reflect and shift thing rather than just a shift thing.

All of this is in the arm documentation which you should have next to you when writing/learning assembly language.

Assembly language is defined by the tool (the assembler) not the target, so gnu assembler and kiel and ARMasm and others are expected to have different assembly languages (mostly in the non-instruction area) and they do. Same for any other target (x86, mips, etc) this is a general rule there aren't standardized assembly languages usually, certainly not for the mainline instruction sets.

Saying that the ldr rx,=label/address trick has with gnu assembler resulted in the optimal instruction, but it pseudo code not a real instruction and as such it is not expected to be supported at all on some assemblers and some that support it may literally implement a pc relative load and not optimize (within the realm of possibilities that one might have a command line option to enable/disable the optimization).

You built for thumb and for thumb you are limited to an unshifted 8 bit immediate. If your cpu happens to support thumb2 as well then you can tell the assembler that command line or in the code and it will generate the optimized instruction and/or you can specify the instruction directly. If thumb2 is not supported then you can either directly craft a pc relative load

mov r5,hello
...
hello: .word 1001

or use the ldr equals thing, or use multiple instructions 3 shifted left 8 orred with 0xE9, that kind of thing.

Edit

Just for Jake...

.thumb

.cpu cortex-m0
ldr r5,=1001

.cpu cortex-m3
ldr r5,=1001

.align

arm-none-eabi-as --version
GNU assembler (GNU Binutils) 2.34
Copyright (C) 2020 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `arm-none-eabi'.

00000000 <.text>:
   0:   4d01        ldr r5, [pc, #4]    ; (8 <.text+0x8>)
   2:   f240 35e9   movw    r5, #1001   ; 0x3e9
   6:   bf00        nop
   8:   000003e9    andeq   r0, r0, r9, ror #7

for armv6m (and armv4t, armv5t, armv6, current armv8ms) you cannot use movw, which is what was implied by the OPs error message.

For armv7, armv7m you can and the ldr instruction generates that, instead of having to keep changing your code based on what immediates you choose, if you use gnu assembler then ldr equals is the best way to go.

arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
00000000 <.text>:
   0:   4d01        ldr r5, [pc, #4]    ; (8 <.text+0x8>)
   2:   f240 35e9   movw    r5, #1001   ; 0x3e9
   6:   bf00        nop
   8:   000003e9    andeq   r0, r0, r9, ror #7

Now while feeding assembly language through gcc is yet another assembly language it still as expected generates the ideal instruction when ldr equals is used. Where you can use movw it does, where you cannot it does not, but let's try this.

.thumb

.cpu cortex-m0
ldr r5,=1001

.cpu cortex-m3
movw r5,#1001

.align

No complaints. Same results.

Trying your suggestion:

.thumb

.cpu cortex-m0
movw r5,#1001

.cpu cortex-m3
movw r5,#1001

.align

arm-none-eabi-gcc so.s -c -o so.o
so.s: Assembler messages:
so.s:6: Error: selected processor does not support `movw r5,#1001' in Thumb mode

and now you have to go re-write your code. movw is not a good solution.

Edit 2 (for the OP)

bottom line, short answer... The reason why you got that message is that you cannot generate a thumb mov immediate instruction with that immediate value because you will see in the arm documentation you don't have that many bits. If when you said rapi 4 you meant raspberry pi 4 that is an armv8 which supports aarch32 (armv7-a) which supports thumb2 extensions (which post armv6-m includes movw)

.thumb
ldr r5,=1001
.align

Use ldr equals to discover the optimal instruction

arm-none-eabi-as -march=armv7a so.s -o so.o
arm-none-eabi-objdump -D so.o

so.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <.text>:
   0:   f240 35e9   movw    r5, #1001   ; 0x3e9

and then use that directly if you wish

.thumb
ldr r5,=1001
movw r5,#1001
.align

Disassembly of section .text:

00000000 <.text>:
   0:   f240 35e9   movw    r5, #1001   ; 0x3e9
   4:   f240 35e9   movw    r5, #1001   ; 0x3e9

If this is indeed a raspberry pi 4 then you need the armv7-ar architectural reference manual to cover the aarch32 stuff and the armv8 (not 8m) architectural reference manual to cover the aarch64 stuff. And a different gnu toolchain as it is a completely different instruction set (aarch64-whatever-whatever vs arm-whatever-whatever). And there are no thumb instructions in aarch64 (yet).