My Raspberry Pi 4B is dying every time it does something (for example, when backup job starts). I'm running Arch Linux (armv7l
) on it. The memory usage is always below 15%.
Below is the log, including an output from free -hw
, which logged 7 seconds before OOM.
net-restart.sh
is a simple bash script. The most complicated thing it does is ping
, so there's no reason for it to cause OOM when there's more than 3 GiB free. Sometimes it's triggered by PostgreSQL vacuum service, sometimes rsync
-based backup. When it goes OOM, it just starts killing one process after another until it dies completely.
I have upgraded the kernel (and other stuff) few times since this started to happen. And there was no SW change before it started. A HW problem?
Btw, I have also tried to add swap (2 GiB), but it didn't help.
23:00:02 free[10890]: total used free shared buffers cache available
23:00:02 free[10890]: Mem: 3,7Gi 82Mi 3,2Gi 2,0Mi 0,0Ki 442Mi 3,6Gi
23:00:02 free[10890]: Swap: 0B 0B 0B
23:00:09 kernel: oom_kill_process: 13 callbacks suppressed
23:00:09 kernel: net-restart.sh invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
23:00:09 kernel: CPU: 2 PID: 10992 Comm: net-restart.sh Tainted: G C 6.1.14-1-rpi-ARCH #1
23:00:09 kernel: Hardware name: BCM2711
23:00:09 kernel: unwind_backtrace from show_stack+0x18/0x1c
23:00:09 kernel: show_stack from dump_stack_lvl+0x90/0xac
23:00:09 kernel: dump_stack_lvl from dump_header+0x54/0x1fc
23:00:09 kernel: dump_header from oom_kill_process+0x23c/0x248
23:00:09 kernel: oom_kill_process from out_of_memory+0x218/0x34c
23:00:09 kernel: out_of_memory from __alloc_pages+0xa98/0x1044
23:00:09 kernel: __alloc_pages from __pmd_alloc+0x3c/0x1d8
23:00:09 kernel: __pmd_alloc from copy_page_range+0xcac/0xcc4
23:00:09 kernel: copy_page_range from dup_mm+0x440/0x5a4
23:00:09 kernel: dup_mm from copy_process+0xda0/0x164c
23:00:09 kernel: copy_process from kernel_clone+0xac/0x3a8
23:00:09 kernel: kernel_clone from sys_clone+0x78/0x9c
23:00:09 kernel: sys_clone from ret_fast_syscall+0x0/0x1c
23:00:09 kernel: Exception stack(0xf08b1fa8 to 0xf08b1ff0)
23:00:09 kernel: 1fa0: b6fd0088 00000001 01200011 00000000 00000000 00000000
23:00:09 kernel: 1fc0: b6fd0088 00000001 b6efae58 00000078 bea210fc 0055d2bc bea2107c 005844e0
23:00:09 kernel: 1fe0: b6fd05a0 bea20f08 b6e2d260 b6e2d684
23:00:09 kernel: Mem-Info:
23:00:09 kernel: active_anon:7451 inactive_anon:603 isolated_anon:0
active_file:39567 inactive_file:70065 isolated_file:0
unevictable:0 dirty:143 writeback:0
slab_reclaimable:3166 slab_unreclaimable:6791
mapped:23163 shmem:594 pagetables:267
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:848488 free_pcp:30 free_cma:80063
23:00:09 kernel: Node 0 active_anon:29804kB inactive_anon:2412kB active_file:158268kB inactive_file:280260kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:92652kB dirty:572kB writeback:0kB shmem:2376kB writeback_tmp:0kB kernel_stack:2360kB pagetables:1068kB sec_pagetab>
23:00:09 kernel: DMA free:323468kB boost:0kB min:3236kB low:4044kB high:4852kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:8076kB inactive_file:279068kB unevictable:0kB writepending:0kB present:786432kB managed:664228kB mlocked:0kB bounce:0kB free_pcp:120kB >
23:00:09 kernel: lowmem_reserve[]: 0 0 3188 3188
23:00:09 kernel: DMA: 143*4kB (UMEC) 119*8kB (UMEC) 68*16kB (UMEC) 23*32kB (UEC) 1*64kB (C) 1*128kB (C) 0*256kB 1*512kB (C) 0*1024kB 0*2048kB 78*4096kB (C) = 323540kB
23:00:09 kernel: 110236 total pagecache pages
23:00:09 kernel: 0 pages in swap cache
23:00:09 kernel: Free swap = 0kB
23:00:09 kernel: Total swap = 0kB
23:00:09 kernel: 1012736 pages RAM
23:00:09 kernel: 816128 pages HighMem/MovableOnly
23:00:09 kernel: 30551 pages reserved
23:00:09 kernel: 81920 pages cma reserved
23:00:09 kernel: Tasks state (memory values in pages):
23:00:09 kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
23:00:09 kernel: [ 242] 0 242 12050 4296 98304 0 -250 systemd-journal
23:00:09 kernel: [ 243] 0 243 7022 1837 61440 0 -1000 systemd-udevd
23:00:09 kernel: [ 516] 81 516 2843 1047 49152 0 -900 dbus-daemon
23:00:09 kernel: [ 550] 0 550 2422 1664 45056 0 -1000 sshd
23:00:09 kernel: [ 554] 0 554 196576 7435 167936 0 -999 containerd
23:00:09 kernel: [ 651] 0 651 203978 13307 245760 0 -500 dockerd
23:00:09 kernel: [ 10882] 978 10882 4543 2764 65536 0 0 systemd-resolve
23:00:09 kernel: [ 10888] 0 10888 1097 201 36864 0 0 agetty
23:00:09 kernel: [ 10889] 977 10889 6022 965 65536 0 0 systemd-timesyn
23:00:09 kernel: [ 10890] 0 10890 2676 341 49152 0 0 free
23:00:09 kernel: [ 10897] 0 10897 3543 1493 57344 0 0 systemd-logind
23:00:09 kernel: [ 10992] 0 10992 2169 824 40960 0 0 net-restart.sh
23:00:09 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=net-restart.service,mems_allowed=0,global_oom,task_memcg=/,task=systemd-resolve,pid=10882,uid=978
23:00:09 kernel: Out of memory: Killed process 10882 (systemd-resolve) total-vm:18172kB, anon-rss:1548kB, file-rss:9508kB, shmem-rss:0kB, UID:978 pgtables:64kB oom_score_adj:0
I've tried to reduce memory usage of my rsync
backup, I've added a service that logs memory stats to see what's going on, I've tried to add swap. Still puzzled.
The issue is this one: https://github.com/raspberrypi/linux/issues/5395
That got fixed by linux upstream commit https://github.com/torvalds/linux/commit/669281ee7ef731fb5204df9d948669bf32a5e68d
Such commit was released on version 6.6 of the kernel and backported to the 6.1 branch on 6.1.54
If you can't easily update the kernel a workaround is to disable MGLRU
$ echo 0 | sudo tee /sys/kernel/mm/lru_gen/enabled
0
Or to do a kernel build with CONFIG_LRU_GEN
disabled