Search code examples
clinuxdeadlockethernetwatchdog

Finding and controlling ethernet watchdog timers


I have a debian linux box (Debian Squeeze) with Broadcom ethernet ports that deadlocks every few hours if I run ethtool to modify NIC buffers before I sniff an interface.

From what I can tell, a kernel watchdog timer is tripping...

NETDEV WATCHDOG: eth3 (bnx2): transmit queue 0 timed out

I think there is a way to control watchdog timers with the system ioctl call (ref: EmbeddedFreak: How to use linux watchdog).

Question

How can I find which watchdog timer(s) is controlling eth3? Bonus points if you can tell me how to change the timer or even disable the watchdog...


Stack trace

Apr 30 08:38:44 Hotcoffee kernel: [275460.837147] ------------[ cut here ]------------
Apr 30 08:38:44 Hotcoffee kernel: [275460.837166] WARNING: at /build/buildd-linux-2.6_2.6.32-41squeeze2-amd64-NDo8b7/linux-2.6-2.6.32/debian/build/source_amd64_none/net/sched/sch_generic.c:261 dev_watchdog+0xe2/0x194()
Apr 30 08:38:44 Hotcoffee kernel: [275460.837169] Hardware name: PowerEdge R710
Apr 30 08:38:44 Hotcoffee kernel: [275460.837171] NETDEV WATCHDOG: eth3 (bnx2): transmit queue 0 timed out
Apr 30 08:38:44 Hotcoffee kernel: [275460.837172] Modules linked in: 8021q garp stp parport_pc ppdev lp parport pci_stub vboxpci vboxnetadp vboxnetflt vboxdrv ext2 loop psmouse power_meter button dcdbas evdev pcspkr processor serio_raw ext4 mbcache jbd2 crc16 sg sr_mod cdrom ses ata_generic sd_mod usbhid hid crc_t10dif enclosure uhci_hcd ehci_hcd megaraid_sas ata_piix thermal libata usbcore nls_base scsi_mod bnx2 thermal_sys [last unloaded: scsi_wait_scan]
Apr 30 08:38:44 Hotcoffee kernel: [275460.837202] Pid: 0, comm: swapper Not tainted 2.6.32-5-amd64 #1
Apr 30 08:38:44 Hotcoffee kernel: [275460.837204] Call Trace:
Apr 30 08:38:44 Hotcoffee kernel: [275460.837206]  <IRQ>  [<ffffffff81263086>] ? dev_watchdog+0xe2/0x194
Apr 30 08:38:44 Hotcoffee kernel: [275460.837211]  [<ffffffff81263086>] ? dev_watchdog+0xe2/0x194
Apr 30 08:38:44 Hotcoffee kernel: [275460.837217]  [<ffffffff8104df9c>] ? warn_slowpath_common+0x77/0xa3
Apr 30 08:38:44 Hotcoffee kernel: [275460.837220]  [<ffffffff81262fa4>] ? dev_watchdog+0x0/0x194
Apr 30 08:38:44 Hotcoffee kernel: [275460.837223]  [<ffffffff8104e024>] ? warn_slowpath_fmt+0x51/0x59
Apr 30 08:38:44 Hotcoffee kernel: [275460.837228]  [<ffffffff8104a4ba>] ? try_to_wake_up+0x289/0x29b
Apr 30 08:38:44 Hotcoffee kernel: [275460.837231]  [<ffffffff81262f78>] ? netif_tx_lock+0x3d/0x69
Apr 30 08:38:44 Hotcoffee kernel: [275460.837237]  [<ffffffff8124dda3>] ? netdev_drivername+0x3b/0x40
Apr 30 08:38:44 Hotcoffee kernel: [275460.837240]  [<ffffffff81263086>] ? dev_watchdog+0xe2/0x194
Apr 30 08:38:44 Hotcoffee kernel: [275460.837242]  [<ffffffff8103fa2a>] ? __wake_up+0x30/0x44
Apr 30 08:38:44 Hotcoffee kernel: [275460.837249]  [<ffffffff8105a71b>] ? run_timer_softirq+0x1c9/0x268
Apr 30 08:38:44 Hotcoffee kernel: [275460.837252]  [<ffffffff81053dc7>] ? __do_softirq+0xdd/0x1a6
Apr 30 08:38:44 Hotcoffee kernel: [275460.837257]  [<ffffffff8102462a>] ? lapic_next_event+0x18/0x1d
Apr 30 08:38:44 Hotcoffee kernel: [275460.837262]  [<ffffffff81011cac>] ? call_softirq+0x1c/0x30
Apr 30 08:38:44 Hotcoffee kernel: [275460.837265]  [<ffffffff8101322b>] ? do_softirq+0x3f/0x7c
Apr 30 08:38:44 Hotcoffee kernel: [275460.837267]  [<ffffffff81053c37>] ? irq_exit+0x36/0x76
Apr 30 08:38:44 Hotcoffee kernel: [275460.837270]  [<ffffffff810250f8>] ? smp_apic_timer_interrupt+0x87/0x95
Apr 30 08:38:44 Hotcoffee kernel: [275460.837273]  [<ffffffff81011673>] ? apic_timer_interrupt+0x13/0x20
Apr 30 08:38:44 Hotcoffee kernel: [275460.837274]  <EOI>  [<ffffffffa01bc509>] ? acpi_idle_enter_bm+0x27d/0x2af [processor]
Apr 30 08:38:44 Hotcoffee kernel: [275460.837283]  [<ffffffffa01bc502>] ? acpi_idle_enter_bm+0x276/0x2af [processor]
Apr 30 08:38:44 Hotcoffee kernel: [275460.837289]  [<ffffffff8123a0ba>] ? cpuidle_idle_call+0x94/0xee
Apr 30 08:38:44 Hotcoffee kernel: [275460.837293]  [<ffffffff8100fe97>] ? cpu_idle+0xa2/0xda
Apr 30 08:38:44 Hotcoffee kernel: [275460.837297]  [<ffffffff8151c140>] ? early_idt_handler+0x0/0x71
Apr 30 08:38:44 Hotcoffee kernel: [275460.837301]  [<ffffffff8151ccdd>] ? start_kernel+0x3dc/0x3e8
Apr 30 08:38:44 Hotcoffee kernel: [275460.837304]  [<ffffffff8151c3b7>] ? x86_64_start_kernel+0xf9/0x106
Apr 30 08:38:44 Hotcoffee kernel: [275460.837306] ---[ end trace 92c65e52c9e327ec ]---

Solution

  • The link you provided discusses a system watchdog device, which can be used to force-reboot a computer if the system stops responding. This has nothing at all to do with the watchdog timers in network interface drivers or other parts of the kernel. Access to these is typically not available in user space; you'd have to go digging deep into the appropriate NIC driver to see what's going on.