I am trying to use clang/llvm as a cross compiler for ARM cortex-m.
Based on an/some LLVM pages this is how I am building the toolchain
rm -rf /opt/llvm/llvm10armv6m
rm -rf llvm-project
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout llvmorg-10.0.0
mkdir build
cd build
cmake -DLLVM_ENABLE_PROJECTS='clang;lld' -DCMAKE_CROSSCOMPILING=True -DCMAKE_INSTALL_PREFIX=/opt/llvm/llvm10armv6m -DLLVM_DEFAULT_TARGET_TRIPLE=armv6m-none-eabi -DLLVM_TARGET_ARCH=ARM -DLLVM_TARGETS_TO_BUILD=ARM -G "Unix Makefiles" ../llvm
make -j 8
make -j 4
make
sudo make install
test.c
void fun ( unsigned int, unsigned int );
int test ( void )
{
unsigned int ra;
unsigned int rx;
for(rx=0;;rx++)
{
ra=rx;
ra|=((~rx)&0xFF)<<16;
fun(0x12345678,ra);
}
return(0);
}
clang -Wall -O2 -nostdlib -ffreestanding -fomit-frame-pointer -c test.c -o test.o
arm-none-eabi-objdump -D test.o
Disassembly of section .text:
00000000 <test>:
0: 20ff movs r0, #255 ; 0xff
2: 0405 lsls r5, r0, #16
4: 2600 movs r6, #0
6: 4c06 ldr r4, [pc, #24] ; (20 <test+0x20>)
8: 4637 mov r7, r6
a: 4629 mov r1, r5
c: 43b1 bics r1, r6
e: 4339 orrs r1, r7
10: 4620 mov r0, r4
12: f7ff fffe bl 0 <fun>
16: 2001 movs r0, #1
18: 0400 lsls r0, r0, #16
1a: 1836 adds r6, r6, r0
1c: 1c7f adds r7, r7, #1
1e: e7f4 b.n a <test+0xa>
20: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
(gnu's output is much better)
The problem here is arms abi says don't destroy r4 and above, certainly not r4 and r7 as it does here, also it isn't preserving the link register for a return from this function (although I guess it sees this is an infinite loop and doesn't return (please don't tell me I fell into the llvm infinite loop bug again)).
with the frame pointer it doesn't get any better
00000000 <test>:
0: b580 push {r7, lr}
2: af00 add r7, sp, #0
4: 20ff movs r0, #255 ; 0xff
6: 0405 lsls r5, r0, #16
8: 2400 movs r4, #0
a: 4626 mov r6, r4
c: 4629 mov r1, r5
e: 43a1 bics r1, r4
10: 4331 orrs r1, r6
12: 4804 ldr r0, [pc, #16] ; (24 <test+0x24>)
14: f7ff fffe bl 0 <fun>
18: 2001 movs r0, #1
1a: 0400 lsls r0, r0, #16
1c: 1824 adds r4, r4, r0
1e: 1c76 adds r6, r6, #1
20: e7f4 b.n c <test+0xc>
22: 46c0 nop ; (mov r8, r8)
24: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
and building the toolchain for
armv6m-none-gnueabi
didn't make it any better
but if I take a generic apt-gotten clang/llvm
clang -Wall -O2 -nostdlib -ffreestanding -fomit-frame-pointer -target armv6m-none-gnueabi -mthumb -mcpu=cortex-m0 -c test.c -o test.o
arm-none-eabi-objdump -D test.o
Disassembly of section .text:
00000000 <test>:
0: b5f0 push {r4, r5, r6, r7, lr}
2: b081 sub sp, #4
4: 20ff movs r0, #255 ; 0xff
6: 0405 lsls r5, r0, #16
8: 2600 movs r6, #0
a: 4c06 ldr r4, [pc, #24] ; (24 <test+0x24>)
c: 4637 mov r7, r6
e: 4629 mov r1, r5
10: 43b1 bics r1, r6
12: 4339 orrs r1, r7
14: 4620 mov r0, r4
16: f7ff fffe bl 0 <fun>
1a: 2001 movs r0, #1
1c: 0400 lsls r0, r0, #16
1e: 1836 adds r6, r6, r0
20: 1c7f adds r7, r7, #1
22: e7f4 b.n e <test+0xe>
24: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
problem is gone.
Now yes at the time of this writing the built one is v10 and the apt-gotten one is v6 (building a v10 one, why does it take an eternity to build? why are the binaries so huge?)
Using the same command line against the built one no change has the abi problem.
Now if I don't optimize perhaps it is just dumb luck:
00000000 <test>:
0: b580 push {r7, lr}
2: b082 sub sp, #8
4: 2000 movs r0, #0
6: 9000 str r0, [sp, #0]
8: e7ff b.n a <test+0xa>
a: 9800 ldr r0, [sp, #0]
c: 9001 str r0, [sp, #4]
e: 4668 mov r0, sp
10: 7800 ldrb r0, [r0, #0]
12: 21ff movs r1, #255 ; 0xff
14: 4048 eors r0, r1
16: 0400 lsls r0, r0, #16
18: 9901 ldr r1, [sp, #4]
1a: 4301 orrs r1, r0
1c: 9101 str r1, [sp, #4]
1e: 9901 ldr r1, [sp, #4]
20: 4803 ldr r0, [pc, #12] ; (30 <test+0x30>)
22: f7ff fffe bl 0 <fun>
26: e7ff b.n 28 <test+0x28>
28: 9800 ldr r0, [sp, #0]
2a: 1c40 adds r0, r0, #1
2c: 9000 str r0, [sp, #0]
2e: e7ec b.n a <test+0xa>
30: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
Links are bad at SO, so
How To Cross-Compile Clang/LLVM using Clang/LLVM
Is the title of the page it has info like this
The CMake options you need to add are:
-DCMAKE_CROSSCOMPILING=True
-DCMAKE_INSTALL_PREFIX=<install-dir>
-DLLVM_TABLEGEN=<path-to-host-bin>/llvm-tblgen
-DCLANG_TABLEGEN=<path-to-host-bin>/clang-tblgen
-DLLVM_DEFAULT_TARGET_TRIPLE=arm-linux-gnueabihf
-DLLVM_TARGET_ARCH=ARM
-DLLVM_TARGETS_TO_BUILD=ARM
I started with the gnu triple I use as the page mentions but then saw that llvm has sub architecture so added that in and initially it all looked good until I made a program with more than a few lines in it.
Am I building llvm incorrectly? Or is this simply the llvm infinite loop thing? (or other...)
Updated build script:
export THEPLACE=/opt/llvm/llvm10armv6m
export THETARGET=armv6m-none-eabi
rm -rf $THEPLACE
rm -rf llvm-project
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout llvmorg-10.0.0
mkdir build
cd build
cmake \
-DLLVM_ENABLE_PROJECTS='clang;lld' \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CROSSCOMPILING=True \
-DCMAKE_INSTALL_PREFIX=$THEPLACE \
-DLLVM_DEFAULT_TARGET_TRIPLE=$THETARGET \
-DLLVM_TARGET_ARCH=ARM \
-DLLVM_TARGETS_TO_BUILD=ARM \
-G "Unix Makefiles" \
../llvm
make -j 8
make -j 4
make
sudo make install
the tbl-gen stuff isn't needed apparently. In theory the -G Unix Makefiles is supposed to allow for parallel buildable makefiles, but I did have an issue with that. One or two places it worked one it didn't and would have to run again and again or eventually serially. thus the makes at the end being that way.
With the Release build the binaries are SIGNIFICANTLY smaller instead of tens of GB it is like 1.somethingGB for the whole install.
I don't think the build is any faster. Still on par with building gcc in the 1990s for duration.
The answer is pretty easy: your function never returns. Therefore it does not make any sense to save / restore callee-saved registers.
If you'd change you source to allow the function terminate, like this:
void fun ( unsigned int, unsigned int );
unsigned bar();
int test ( void )
{
unsigned int ra;
unsigned int rx;
for(rx=0;rx<bar();rx++)
{
ra=rx;
ra|=((~rx)&0xFF)<<16;
fun(0x12345678,ra);
}
return(0);
}
Everything will be saved / restored as you expected.
PS: I would not comment on whether infinite loop is UB
PPS: You may certainly want to compile llvm/clang in Release mode – the binaries will be smaller and the linking time will reduce dramatically.