Search code examples
linuxdockercrash

How to diagnose linux randomely crashing


I have a homeserver ( running a standard desktop config ) where I use only a docker compose stack.

From times to times it crash(?) randomely. I discover it when my services are unavailable and I'm not able anymore to ssh to it.

I have already tried with 2 different os, Ubuntu server and Nixos, so I don't suspect them to be the source of my problem.

When it happen I hard reboot the system and it works perfectly fine after that.

Here is the log ( or more the lack of ) I have in syslogs truncated to the moment where it crash.

sept. 11 01:52:25 nixos 9cd85f03e4e6[3105]:   },
sept. 11 01:52:25 nixos 9cd85f03e4e6[3105]:     'statsd.metrics_received': 0
sept. 11 01:52:25 nixos 9cd85f03e4e6[3105]:   },
sept. 11 01:52:25 nixos 9cd85f03e4e6[3105]:   sets: {},
sept. 11 01:52:25 nixos 9cd85f03e4e6[3105]:   pctThreshold: [ 90 ]
sept. 11 01:52:25 nixos 9cd85f03e4e6[3105]: }
sept. 11 02:00:25 nixos systemd[1]: Started Logrotate Service.
sept. 11 02:00:25 nixos systemd[1]: logrotate.service: Deactivated successfully.
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]: Flushing stats at  Mon Sep 11 2023 00:02:25 GMT+0000 (Coordinated Universal Time)
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]: {
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:   counters: {
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:     'statsd.bad_lines_seen': 0,
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:     'statsd.packets_received': 0,
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:     'statsd.metrics_received': 0
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:   },
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:   timers: {},
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:   gauges: { 'statsd.timestamp_lag': 0 },
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:   timer_data: {},
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:   counter_rates: {
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:     'statsd.bad_lines_seen': 0,
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:     'statsd.packets_received': 0,
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:     'statsd.metrics_received': 0
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:   },
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:   sets: {},
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]:   pctThreshold: [ 90 ]
sept. 11 02:02:25 nixos 9cd85f03e4e6[3105]: }
sept. 11 02:02:29 nixos d44e3444dc2e[3105]: 2023-09-11T00:02:29.668Z [MASTER] info: Purging orphaned upload files...
sept. 11 02:02:29 nixos d44e3444dc2e[3105]: 2023-09-11T00:02:29.669Z [MASTER] info: Purging orphaned upload files: [ COMPLETED ]
-- Boot ec700ac6b9a2458896b87f5c459872fe --
sept. 11 17:01:23 nixos kernel: Linux version 6.1.51 (nixbld@localhost) (gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.40) #1-NixOS SMP PREEMPT_DYNAMIC Sat Sep  2 07:16:20 UTC 2023
sept. 11 17:01:23 nixos kernel: Command line: initrd=\efi\nixos\cix17i101cnd1v1q6k8n3zsl6dbf6a9b-initrd-linux-6.1.51-initrd.efi init=/nix/store/582kkbsscbzmvpirdfqc67mr5496y4ci-nixos-syst>
sept. 11 17:01:23 nixos kernel: BIOS-provided physical RAM map:

I don't know what to do to continue debugging this.


Solution

  • I think that my problem is probably the computer. I've installed Proxmox to have 2 vms, one for the stack and the other to monitor the first one and mmy proxmox crashed to I'm 99.9% sure that it's not on the linux side.