Search code examples
linuxarmramoom

OOM for no reason (Arch/Raspberry)


My Raspberry Pi 4B is dying every time it does something (for example, when backup job starts). I'm running Arch Linux (armv7l) on it. The memory usage is always below 15%.

Below is the log, including an output from free -hw, which logged 7 seconds before OOM.

net-restart.sh is a simple bash script. The most complicated thing it does is ping, so there's no reason for it to cause OOM when there's more than 3 GiB free. Sometimes it's triggered by PostgreSQL vacuum service, sometimes rsync-based backup. When it goes OOM, it just starts killing one process after another until it dies completely.

I have upgraded the kernel (and other stuff) few times since this started to happen. And there was no SW change before it started. A HW problem?

Btw, I have also tried to add swap (2 GiB), but it didn't help.

23:00:02 free[10890]:                total        used        free      shared     buffers       cache   available
23:00:02 free[10890]: Mem:           3,7Gi        82Mi       3,2Gi       2,0Mi       0,0Ki       442Mi       3,6Gi
23:00:02 free[10890]: Swap:             0B          0B          0B

23:00:09 kernel: oom_kill_process: 13 callbacks suppressed
23:00:09 kernel: net-restart.sh invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
23:00:09 kernel: CPU: 2 PID: 10992 Comm: net-restart.sh Tainted: G         C         6.1.14-1-rpi-ARCH #1
23:00:09 kernel: Hardware name: BCM2711
23:00:09 kernel:  unwind_backtrace from show_stack+0x18/0x1c
23:00:09 kernel:  show_stack from dump_stack_lvl+0x90/0xac
23:00:09 kernel:  dump_stack_lvl from dump_header+0x54/0x1fc
23:00:09 kernel:  dump_header from oom_kill_process+0x23c/0x248
23:00:09 kernel:  oom_kill_process from out_of_memory+0x218/0x34c
23:00:09 kernel:  out_of_memory from __alloc_pages+0xa98/0x1044
23:00:09 kernel:  __alloc_pages from __pmd_alloc+0x3c/0x1d8
23:00:09 kernel:  __pmd_alloc from copy_page_range+0xcac/0xcc4
23:00:09 kernel:  copy_page_range from dup_mm+0x440/0x5a4
23:00:09 kernel:  dup_mm from copy_process+0xda0/0x164c
23:00:09 kernel:  copy_process from kernel_clone+0xac/0x3a8
23:00:09 kernel:  kernel_clone from sys_clone+0x78/0x9c
23:00:09 kernel:  sys_clone from ret_fast_syscall+0x0/0x1c
23:00:09 kernel: Exception stack(0xf08b1fa8 to 0xf08b1ff0)
23:00:09 kernel: 1fa0:                   b6fd0088 00000001 01200011 00000000 00000000 00000000
23:00:09 kernel: 1fc0: b6fd0088 00000001 b6efae58 00000078 bea210fc 0055d2bc bea2107c 005844e0
23:00:09 kernel: 1fe0: b6fd05a0 bea20f08 b6e2d260 b6e2d684
23:00:09 kernel: Mem-Info:
23:00:09 kernel: active_anon:7451 inactive_anon:603 isolated_anon:0
                                                 active_file:39567 inactive_file:70065 isolated_file:0
                                                 unevictable:0 dirty:143 writeback:0
                                                 slab_reclaimable:3166 slab_unreclaimable:6791
                                                 mapped:23163 shmem:594 pagetables:267
                                                 sec_pagetables:0 bounce:0
                                                 kernel_misc_reclaimable:0
                                                 free:848488 free_pcp:30 free_cma:80063
23:00:09 kernel: Node 0 active_anon:29804kB inactive_anon:2412kB active_file:158268kB inactive_file:280260kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:92652kB dirty:572kB writeback:0kB shmem:2376kB writeback_tmp:0kB kernel_stack:2360kB pagetables:1068kB sec_pagetab>
23:00:09 kernel: DMA free:323468kB boost:0kB min:3236kB low:4044kB high:4852kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:8076kB inactive_file:279068kB unevictable:0kB writepending:0kB present:786432kB managed:664228kB mlocked:0kB bounce:0kB free_pcp:120kB >
23:00:09 kernel: lowmem_reserve[]: 0 0 3188 3188
23:00:09 kernel: DMA: 143*4kB (UMEC) 119*8kB (UMEC) 68*16kB (UMEC) 23*32kB (UEC) 1*64kB (C) 1*128kB (C) 0*256kB 1*512kB (C) 0*1024kB 0*2048kB 78*4096kB (C) = 323540kB
23:00:09 kernel: 110236 total pagecache pages
23:00:09 kernel: 0 pages in swap cache
23:00:09 kernel: Free swap  = 0kB
23:00:09 kernel: Total swap = 0kB
23:00:09 kernel: 1012736 pages RAM
23:00:09 kernel: 816128 pages HighMem/MovableOnly
23:00:09 kernel: 30551 pages reserved
23:00:09 kernel: 81920 pages cma reserved
23:00:09 kernel: Tasks state (memory values in pages):
23:00:09 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
23:00:09 kernel: [    242]     0   242    12050     4296    98304        0          -250 systemd-journal
23:00:09 kernel: [    243]     0   243     7022     1837    61440        0         -1000 systemd-udevd
23:00:09 kernel: [    516]    81   516     2843     1047    49152        0          -900 dbus-daemon
23:00:09 kernel: [    550]     0   550     2422     1664    45056        0         -1000 sshd
23:00:09 kernel: [    554]     0   554   196576     7435   167936        0          -999 containerd
23:00:09 kernel: [    651]     0   651   203978    13307   245760        0          -500 dockerd
23:00:09 kernel: [  10882]   978 10882     4543     2764    65536        0             0 systemd-resolve
23:00:09 kernel: [  10888]     0 10888     1097      201    36864        0             0 agetty
23:00:09 kernel: [  10889]   977 10889     6022      965    65536        0             0 systemd-timesyn
23:00:09 kernel: [  10890]     0 10890     2676      341    49152        0             0 free
23:00:09 kernel: [  10897]     0 10897     3543     1493    57344        0             0 systemd-logind
23:00:09 kernel: [  10992]     0 10992     2169      824    40960        0             0 net-restart.sh
23:00:09 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=net-restart.service,mems_allowed=0,global_oom,task_memcg=/,task=systemd-resolve,pid=10882,uid=978
23:00:09 kernel: Out of memory: Killed process 10882 (systemd-resolve) total-vm:18172kB, anon-rss:1548kB, file-rss:9508kB, shmem-rss:0kB, UID:978 pgtables:64kB oom_score_adj:0

I've tried to reduce memory usage of my rsync backup, I've added a service that logs memory stats to see what's going on, I've tried to add swap. Still puzzled.


Solution

  • The issue is this one: https://github.com/raspberrypi/linux/issues/5395

    That got fixed by linux upstream commit https://github.com/torvalds/linux/commit/669281ee7ef731fb5204df9d948669bf32a5e68d

    Such commit was released on version 6.6 of the kernel and backported to the 6.1 branch on 6.1.54

    If you can't easily update the kernel a workaround is to disable MGLRU

    $ echo 0 | sudo tee /sys/kernel/mm/lru_gen/enabled
    0
    

    Or to do a kernel build with CONFIG_LRU_GEN disabled