Search code examples
linuxperloopsshtelnet

Perl Script causing Kernel Panic


I made a plugin for Nagios/Icinga that parses networking device logs for strings but it's causing kernel panics in large environments. The full code can be found here. I've tried reinstalling the kernel and some packages but it still persists. It's also running on another server just fine, but that server monitors less hosts. How can I troubleshoot the Oops to fix the code or repair the server?

The script uses Net::OpenSSH to connect to different networking devices and run "sh log", an example excerpt is:

my $cisco_cmd = 'sh log ';  

# SSH
if ($socket) {
    SSH();

    # Cisco SSH command
    my $ssh_session = $ssh->system({stdout_fh=> $stdout_fh}, $cisco_cmd);
}
sub SSH{
    $ssh = Net::OpenSSH->new($host, user=>$username, 
                                    password=>$password, 
                                    timeout => 30, 
                                    master_stdout_fh => $stdout_fh,
                                    master_stderr_fh => $stdout_fh,
                                    master_opts => [-o => "KexAlgorithms=+diffie-hellman-group1-sha1",
                                                    -o => "HostKeyAlgorithms=+ssh-dss",
                                                    -o => "StrictHostKeyChecking no"]);
    if ($ssh->error) {
            print "Unknown - Unable to connect to remote host: ". $ssh->error . "\n";
            exit 3;
        };

    return $ssh;
} 

Over time syslog will start logging Oops's before it goes into kernel panic.

The Oops is:

Oct  6 14:11:43 icinga1 kernel: [359392.196625] BUG: unable to handle kernel NULL pointer dereference at           (null)
Oct  6 14:11:43 icinga1 kernel: [359392.196632] IP: [<ffffffff814fa7c5>] tty_ioctl+0x375/0xc40
Oct  6 14:11:43 icinga1 kernel: [359392.196640] PGD 0 
Oct  6 14:11:43 icinga1 kernel: [359392.196642] Oops: 0000 [#1] SMP 
Oct  6 14:11:43 icinga1 kernel: [359392.196645] Modules linked in: vmw_vsock_vmci_transport vsock coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd vmw_balloon input_leds joydev serio_raw shpchp i2c_piix4 vmw_vmci 8250_fintek mac_hid sunrpc parport_pc ppdev lp parport autofs4 vmw_pvscsi vmwgfx ttm drm_kms_helper syscopyarea psmouse sysfillrect sysimgblt mptspi fb_sys_fops mptscsih drm mptbase ahci vmxnet3 libahci scsi_transport_spi pata_acpi floppy fjes
Oct  6 14:11:43 icinga1 kernel: [359392.196672] CPU: 1 PID: 21451 Comm: ssh Not tainted 4.4.0-96-generic #119-Ubuntu
Oct  6 14:11:43 icinga1 kernel: [359392.196684] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
Oct  6 14:11:43 icinga1 kernel: [359392.196686] task: ffff88042d008000 ti: ffff88033194c000 task.ti: ffff88033194c000
Oct  6 14:11:43 icinga1 kernel: [359392.196688] RIP: 0010:[<ffffffff814fa7c5>]  [<ffffffff814fa7c5>] tty_ioctl+0x375/0xc40
Oct  6 14:11:43 icinga1 kernel: [359392.196691] RSP: 0018:ffff88033194fdf0  EFLAGS: 00010246
Oct  6 14:11:43 icinga1 kernel: [359392.196692] RAX: 0000000000000000 RBX: ffff8803c8aec800 RCX: fffffffeffffffff
Oct  6 14:11:43 icinga1 kernel: [359392.196693] RDX: fffffffe00000001 RSI: 0000000000000000 RDI: ffff8803c8aec828
Oct  6 14:11:43 icinga1 kernel: [359392.196694] RBP: ffff88033194fe98 R08: 0000563d7cc698e0 R09: 0000563d7c0cc880
Oct  6 14:11:43 icinga1 kernel: [359392.196695] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000005401
Oct  6 14:11:43 icinga1 kernel: [359392.196697] R13: 00007ffef42a03b0 R14: ffff8803d1d9c500 R15: 0000000000000000
Oct  6 14:11:43 icinga1 kernel: [359392.196699] FS:  00002b64e6d8f000(0000) GS:ffff88043fc40000(0000) knlGS:0000000000000000
Oct  6 14:11:43 icinga1 kernel: [359392.196700] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct  6 14:11:43 icinga1 kernel: [359392.196701] CR2: 0000000000000000 CR3: 00000000bac36000 CR4: 00000000000406e0
Oct  6 14:11:43 icinga1 kernel: [359392.196784] Stack:
Oct  6 14:11:43 icinga1 kernel: [359392.196786]  ffff8802bf401600 ffff88033194fe20 ffffffff812276f2 ffff8802bf401600
Oct  6 14:11:43 icinga1 kernel: [359392.196788]  ffff8802bf401658 ffff8802bf401600 ffff88033194fe58 ffffffff8122795e
Oct  6 14:11:43 icinga1 kernel: [359392.196790]  ffff880428845000 0000000000000008 ffff8802cc420cb0 ffff88042d5d2920
Oct  6 14:11:43 icinga1 kernel: [359392.196792] Call Trace:
Oct  6 14:11:43 icinga1 kernel: [359392.196800]  [<ffffffff812276f2>] ? __dentry_kill+0x162/0x1e0
Oct  6 14:11:43 icinga1 kernel: [359392.196802]  [<ffffffff8122795e>] ? dput+0x1ee/0x220
Oct  6 14:11:43 icinga1 kernel: [359392.196818]  [<ffffffff81231224>] ? mntput+0x24/0x40
Oct  6 14:11:43 icinga1 kernel: [359392.196822]  [<ffffffff81211f00>] ? __fput+0x190/0x220
Oct  6 14:11:43 icinga1 kernel: [359392.196824]  [<ffffffff81223faf>] do_vfs_ioctl+0x29f/0x490
Oct  6 14:11:43 icinga1 kernel: [359392.196826]  [<ffffffff81211fce>] ? ____fput+0xe/0x10
Oct  6 14:11:43 icinga1 kernel: [359392.196830]  [<ffffffff8109f116>] ? task_work_run+0x86/0xa0
Oct  6 14:11:43 icinga1 kernel: [359392.196832]  [<ffffffff81224219>] SyS_ioctl+0x79/0x90
Oct  6 14:11:43 icinga1 kernel: [359392.196836]  [<ffffffff81843272>] entry_SYSCALL_64_fastpath+0x16/0x71
Oct  6 14:11:43 icinga1 kernel: [359392.196838] Code: 18 48 8b 41 60 48 85 c0 74 16 4c 89 ea 44 89 e6 48 89 df ff d0 3d fd fd ff ff 0f 85 da fd ff ff 48 89 df e8 3e 74 00 00 49 89 c7 <48> 8b 00 4c 8b 40 48 48 c7 c0 ea ff ff ff 4d 85 c0 74 22 44 89 
Oct  6 14:11:43 icinga1 kernel: [359392.196859] RIP  [<ffffffff814fa7c5>] tty_ioctl+0x375/0xc40
Oct  6 14:11:43 icinga1 kernel: [359392.196861]  RSP <ffff88033194fdf0>
Oct  6 14:11:43 icinga1 kernel: [359392.196862] CR2: 0000000000000000
Oct  6 14:11:43 icinga1 kernel: [359392.196865] ---[ end trace 72b7f0a8e26ab854 ]---

Solution

  • The bug report in the log says the issue is on tty_ioctl, so it is triggered when ioctl is called on a tty file descriptor. Net::OpenSSH uses a pseudo-tty when doing password authentication, so, a possible workaround for your problem would be to switch to a different authentication mechanism not requiring a tty as for instance public key authentication.

    Also, the backtrace shows that the crash happens when manipulating the file system, so maybe your real problem is a corrupted file system triggering some kernel bug. You could try to force a fsck on the machine file systems or just to recreate them.

    You didn't say which Linux distribution you are using neither the kernel version. You could try switching to a different one.

    In any case, it is the first time anybody reports this problem, and Net::OpenSSH is used frequently with password authentication, so, unless you are using a pretty rare kernel version, there should be something special in your system that causes this bug to show up.