Recently i have experimented with MMU initialization code on raspberry pi 2 and encountered with strange behavior. What i am trying to do is to establish trivial sections mapping.
I used this code as reference base. Although, the brief review had shown that this code is written for bcm2835, still don't have anything better than that.
The problem, i have encountered, was dead-end after cache flushing. Here is the full sample of start_mmu function
.globl start_mmu
start_mmu:
mov r2,#0
mcr p15,0,r2,c7,c7,0 ;@ invalidate caches
mcr p15,0,r2,c8,c7,0 ;@ invalidate tlb
mcr p15,0,r2,c7,c10,4 ;@ DSB ??
mvn r2,#0
bic r2,#0xC
mcr p15,0,r2,c3,c0,0 ;@ domain
mcr p15,0,r0,c2,c0,0 ;@ tlb base
mcr p15,0,r0,c2,c0,1 ;@ tlb base
mrc p15,0,r2,c1,c0,0
orr r2,r2,r1
mcr p15,0,r2,c1,c0,0
In other words i get dead-end on cache invalidating line:
mcr p15,0,r2,c7,c7,0 ;@ invalidate caches
By dead-end i mean that i can't print something after this line was executed. It seems that i falling in to some exception at that moment. If i omit this cache inval line, i can go forward, but it seems that MMU mapping aren't established correctly after my setup (but this is another question). What i want to know is:
1.) Why do we need invalidate caches and tlb before MMU startup?
2.) What could be the reason of dead-end problem?
Why do we need invalidate caches and tlb before MMU startup?
Because they could contain uninitialised junk (or just stale entries after a reset). As soon as you turn the MMU on, addresses for instruction/data accesses may be looked up in the TLBs, and if any of that junk happens to look sufficiently like a valid entry matching the relevant virtual address then you're going to have a bad time. Similarly for the instructions/data themselves once the caches are enabled.
What could be the reason of dead-end problem?
You're executing an invalid instruction.
If you want to write bare-metal code, it pays to understand the metal you're running on - the Raspberry Pi 2 has Cortex-A7 cores, which are not the same as the ARM1176 core in the other models, and thus behave differently. Specifically in this case, the CP15 c7, 0, c7 system register space where unified cache operations lived under the ARMv6 architecture is no longer allocated in ARMv7, thus attempting to access it leads to unpredictable behaviour. You need to invalidate your I-cache and D-cache separately. I'd recommend at very least looking at the Cortex-A7 TRM, and ideally the Architecture Reference Manual. Also for real-world examples, there's always Linux and friends. Yes it's an awful lot to take in, but hey, this is a full-blown multi-core mobile-class application processor, not some microcontroller ;)
Now, the first priority should be to set up some exception vector handlers that will give some debug output when things go wrong, because a lot more things are bound to go wrong from here on in...