Search code examples
linuxlinux-kernelembedded-linuxyoctodebug-symbols

Builtin Platform Driver __initcall Not Called on Linux Kernel Init


Background

I am bringing up a Linux kernel via Yocto for some vendor-provided embedded hardware. I have configured the image to boot via fitImage with an initramfs and no rootfs (there is persistent storage but this is entirely for userspace application use). Think PXE live image and you won't be far off.

Things have been going well until my initramfs image crossed the ~128MB mark. Below this and everything boots as expected and all drivers are bound without issue. Above this mark and the kernel still boots but many drivers, though not all, are not bound. This is quite perplexing as all drivers are statically built into the kernel (no modules are used on this platform). Unfortunately, one of these modules runs the platform watchdog which causes entirely predictable reboots.

Thus far I have verified that all of the symbols are present in the vmlinux image:

$ objdump -x vmlinux | grep mtk_wdt
0000000000000000 l    df *ABS*  0000000000000000 mtk_wdt.c
ffffff800880ac40 l     F .text  000000000000004c mtk_wdt_stop
ffffff800880ac90 l     F .text  0000000000000040 mtk_wdt_shutdown
ffffff80091de778 l     F .init.text     0000000000000020 mtk_wdt_driver_init
ffffff800880acd0 l     F .text  000000000000004c mtk_wdt_ping
ffffff800880ad20 l     F .text  0000000000000070 mtk_wdt_set_timeout
ffffff800880ad90 l     F .text  0000000000000074 mtk_wdt_start
ffffff800880ae08 l     F .text  0000000000000144 mtk_wdt_resume
ffffff800880af50 l     F .text  0000000000000120 mtk_wdt_suspend
ffffff800880b070 l     F .text  0000000000000080 mtk_wdt_remove
ffffff80088977a8 l     F .text  0000000000000210 mtk_wdt_isr
ffffff80091fe0a0 l     F .exit.text     000000000000001c mtk_wdt_driver_exit
ffffff800880b4f0 l     F .text  0000000000000310 mtk_wdt_probe
ffffff8008c2acd8 l     O .rodata        0000000000000028 mtk_wdt_info
ffffff8008c2ad00 l     O .rodata        0000000000000050 mtk_wdt_ops
ffffff8008c2ad98 l     O .rodata        00000000000000b8 mtk_wdt_pm_ops
ffffff8008c2ae50 l     O .rodata        0000000000000190 mtk_wdt_dt_ids
ffffff80093a3cb8 l     O .data  00000000000000b0 mtk_wdt_driver
ffffff800a199368 l     O .bss   0000000000000008 mtk_wdt1
ffffff8009285598 l     O .init.data     0000000000000008 __initcall_mtk_wdt_driver_init6

Additionally, I have sha256 checksums for the binary kernel (eg. linux.bin), initramfs and device tree in the before fitImage assembly, after fitImage assembly and after unpacking into system memory (via bootloader); all match. Near as I can tell, what gets built is what gets unpacked and booted.

Furthermore, I have enabled initcall_debug and, while I see other __initcall()s, the non-bound drivers are, unsurprisingly, missing.

I know the devices are present in the device tree and correctly configured. After boot, I get about 5 seconds of console access before the watchdog kicks; just enought time to get a command or two off. On a "working" images (initramfs < ~128MB) and on failing images (initramfs > ~128MB) the contents of /sys/bus/platform/devices are identical and I can see (among others), my watchdog:

$ ls -lha /sys/bus/platform/devices
...
lrwxrwxrwx 1 root root 0 Jan  1 00:00 10007000.watchdog -> ../../../devices/platform/10007000.watchdog

Performing the same test but comparing /sys/bus/platform/devices shows the drivers which weren't __initcall()ed as missing.

Some collective other things I have checked:

  • Device Tree. The same DTB is used in both the working and broken images. I have also verified the device tree in-memory as mentioned above. Device tree overlays are not used.
  • Real memory load offsets. Everything is where is should be and there is plenty of space in each region. I can move the kernel around in memory an the issue persists regardless of location.
  • Bad memory. This failure happens identically across multiple units.
  • Bad compression. The issue manifests regardless of kernel / initramfs compression. Currently I am testing with everything uncompressed to minimize breakage points.
  • Bad signing. I have disabled signature verification (applied to the fitImage partition image after all else); no dice there either.
  • Attempting to bundle the initramfs directly into the kernel. No change. Right now I have the initramfs as built into the fitImage but otherwise loaded and verified independently.
  • Bad kernel command line arguments. I am using root=/dev/ram initrd=0x48000000,384M and have traced this all the way into init/initramfs.c where unpacking is done. I was able to verify the offsets passed are, indeed, in the correct virtual memory space and sum to 384M.
  • Updating the kernel linker script per this forum post. I am able to see a .initramfs section generated in vmlinux via objdump but yet the issue persists.

Given all of the above, the only thing I don't know how to verify is the jump from vmlinux to linux.bin. This is done in Yocto via objcopy as follows:

[ -n "${vmlinux_path}" ] && ${OBJCOPY} -O binary -R .note -R .comment -S "${vmlinux_path}" linux.bin

Questions

  1. How can I verify a given symbol is included in the final linux.bin?
  2. What mechanisms would affect inclusion or exclusion of a given symbol at build time?
  3. Which pieces of the kernel build and runtime are affected by initramfs size?
  4. Are there any other tools / techniques / tribal wisdom which can help debug this situation?

EDIT 1:

Below is the basic memory map of where everything lands and space utilization. As mentioned above and in the comments, I can relocate the kernel, DTB and initramfs to (almost) arbitrary locations but the issue still persists.

0x40000000 - 0x40001000 = Bootloader arg area (Fixed usage)
0x40080000 - 0x41EDFFFF = Kernel (~12MB / 29.5MB used)
0x41E00000 - 0x42FF5FFF = Trampoline (96 bytes / ~6MB used)
0x42FF6000 - 0x42FFFFFF = ATF BL3-1 (Fixed usage)
0x43000000 - 0x43FFFFFF = Trusted OS (~476K / 16M used)
0x44000000 - 0x44FFFFFF = DTB (~77.3K / 16M used)
0x45000000 - 0x47FFFFFF = Trusted OS memory (dynamic)
0x48000000 - 0x5FFFFFFF = Initramfs (~129MB / 384MB used)
0x60000000 - MEM END    = Free

Solution

  • So, like most kernel issues the real problem was not where I thought it was. As it turns out, the problem was caused by one of the other drivers earlier in the init list hanging the core, preventing the watchdog driver from being registered. How this is affected by the initramfs is beyond me and is its own question.

    For anyone who comes across this in the future, the answers to my specific questions above are listed below:

    1. How can I verify a given symbol is included in the final linux.bin?

    I was not able to figure out how to do this statically. That said, I was able to print the addresses of the init functions at runtime by adding printk()s to do_initcall_level in init/main.c. The addresses printed can then be compared to the output of objdump on vmlinux (see my question for the incantation).

    A really useful and in-depth description of the initcall process can be found here.

    Note that you can also turn on initcall_debug, which will print each function name. In my case I wanted raw addresses which is why I chose the printk() method.

    1. What mechanisms would affect inclusion or exclusion of a given symbol at build time?

    Most of this boils down to your .config. The vast majority of inclusion / exclusion is done via the preprocessor. Other useful items are the linker script common header at include/asm-generic/vmlinux.lds.h and the platform linker script at for your device arch/<arch>/*/*.lds.

    1. Which pieces of the kernel build and runtime are affected by initramfs size?

    No idea on this one, still.

    1. Are there any other tools / techniques / tribal wisdom which can help debug this situation?

    Don't panic