After a few of days reading code about the ARM linux kernel booting process, I understood most of them except several tricky parts in function __turn_mmu_on:
.align 5
__turn_mmu_on:
mov r0, r0
mcr p15, 0, r0, c1, c0, 0 @ write control reg
mrc p15, 0, r3, c0, c0, 0 @ read id reg
mov r3, r3
mov r3, r3
mov pc, r13
ENDPROC(__turn_mmu_on)
The last instruction mov pc, r13
will branch to __mmap_switched
, as follows:
__mmap_switched:
adr r3, __switch_data + 4
....
r3
is simply overwritten in instruction adr r3, __switch_data + 4
?Alignment is probably not required, but is likely used to ensure that the whole function fits into a cache line and so the last few instructions will be executed from the cache and won't have to be fetched from the memory (even though the function should remain at the same address with MMU on because it's identity mapped).
It was not easy to track down the origin of the MRC
instruction but I think I found it:
Date: 2004-04-04 04:35 +200
To: linux-arm-patches
Subject: [Linux-arm-patches] 1204.1: XSCALE processor stalls when enabling MMU
--- kernel-source-2.5.21-rmk/arch/arm/kernel/head.S Sun Jun 9 07:26:29 2002
+++ kernel-2.5.21-was/arch/arm/kernel/head.S Fri Jul 12 20:41:42 2002
@@ -118,9 +118,7 @@ __turn_mmu_on:
orr r0, r0, #2 @ ...........A.
#endif
mcr p15, 0, r0, c1, c0
- mov r0, r0
- mov r0, r0
- mov r0, r0
+ cpwait r10
mov pc, lr
[...]
+/*
+ * cpwait - wait for coprocessor operation to finish
+ * this is the canonical way to wait for cp updates
+ * on PXA2x0 as proposed by Intel
+ */
+ .macro cpwait reg
+ mrc p15, 0, \reg, c2, c0, 0 @ arbitrary cp reg read
+ mov r0, r0 @ nop
+ sub pc, pc, #4 @ nop
+ .endm
The ensuing discussion on merits of this patch ended in the current approach:
...
We can however get closer to the Xscale recommended sequence by knowing how things work on other CPUs, and knowing what we're doing here. If we insert the following instruction after the mcr, then this should solve your issue.mrc p15, 0, r0, c1, c0
Since the read-back of the same register is guaranteed by the ARM architecture manual to return the value that was written there (if it doesn't, the CPU isn't an ARM compliant implementation), this means we can guarantee that the write to the register has taken effect. The use of the "mov r0, r0" instructions are the same as in the CPWAIT macro. The mov pc, lr is equivalent to the "sub pc, pc, #4" (they are defined to be the same class of instructions), so merely adding one instruction should guarantee that the Xscale works as expected.
...
The original patch was from Lothar Wassmann, the final code is probably by Russel King.