cstime error in /proc/pid/stat file

stime or cstime in /proc/pid/stat file is so huge that doesn't make any sense. But just some processes have this wrong cstime on occasion. Just as following:

# ps -eo pid,ppid,stime,etime,time,%cpu,%mem,command |grep nsc
4815     1 Jan08  1-01:20:02 213503-23:34:33 20226149  0.1 /usr/sbin/nscd
#
# cat /proc/4815/stat
4815 (nscd) S 1 4815 4815 0 -1 4202560 2904 0 0 0 21 1844674407359 0 0 20 0 9 0 4021 241668096 326 18446744073709551615 139782748139520 139782748261700 140737353849984 140737353844496 139782734487251 0 0 3674112 16390 18446744073709551615 0 0 17 1 0 0 0 0 0

You can see the stime of proc 4815, nscd, is 1844674407359, equal to 213503-23:34:33, but has just been running for 1-01:20:02.

Another problem process has wrong cstime is following:

a bash fork a sh, which fork a sleep.

8155 (bash) S 3124 8155 8155 0 -1 4202752 1277 6738 0 0 3 0 4 1844674407368 20 0 1 0 1738175 13258752 451 18446744073709551615 4194304 4757932 140736528897536 140736528896544 47722675403157 0 65536 4100 65538 18446744071562341410 0 0 17 5 0 0 0 0 0

8184 (sh) S 8155 8155 8155 0 -1 4202496 475 0 0 0 0 0 0 0 20 0 1 0 1738185 11698176 357 18446744073709551615 4194304 4757932 140733266239472 140733266238480 47964680542613 0 65536 4100 65538 18446744071562341410 0 0 17 6 0 0 0 0 0

8185 (sleep) S 8184 8155 8155 0 -1 4202496 261 0 0 0 0 0 0 0 20 0 1 0 1738186 8577024 177 18446744073709551615 4194304 4212204 140734101195248 140734101194776 48002231427168 0 0 4096 0 0 0 0 17 7 0 0 0 0 0

So you can see that cstime in proc bash is 1844674407368, which is much larger than total cpu time of its children.

My server has one Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, which is 4 cores and 8 threads. Operating system is Suse Linux Enterprise Server SP1 x86_64, as following

# lsb_release  -a
LSB Version:    core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64:desktop-4.0-amd64:desktop-4.0-noarch:graphics-2.0-amd64:graphics-2.0-noarch:graphics-3.2-amd64:graphics-3.2-noarch:graphics-4.0-amd64:graphics-4.0-noarch
Distributor ID: SUSE LINUX
Description:    SUSE Linux Enterprise Server 11 (x86_64)
Release:    11
Codename:   n/a
#
# uname -a
Linux node2 2.6.32.12-0.7-xen #1 SMP 2010-05-20 11:14:20 +0200 x86_64 x86_64 x86_64 GNU/Linux

So is it the kernel's problem? Can anybody help fix it?

Solution

I suspect that you might be simply seeing a kernel bug. Update to the latest offered update kernel for SLES (which is something like 2.6.32.42 or so) and see if it still occurs. Btw, it's stime, not cstime, that is unusually high—in fact, looking close you will notice it is a value that is like a string truncation of 18446744073709551615 (2^64-1) ±a few clocks offset.

pid_nr: 4815
tcomm: (nscd)
state: S
ppid: 1
pgid: 4815
sid: 4815
tty_nr: 0
tty_pgrp: -1
task_flags: 4202560 / 0x402040
min_flt: 2904
cmin_flt: 0
max_flt: 0
cmax_flt: 0
utime: 21 clocks (= 21 clocks) (= 0.210000 s)
stime: 1844674407359 clocks (= 1844674407359 clocks) (= 18446744073.590000 s)
cutime: 0 clocks (= 0 clocks) (= 0.000000 s)
cstime: 0 clocks (= 0 clocks) (= 0.000000 s)
priority: 20
nice: 0
num_threads: 9
always-zero: 0
start_time: 4021
vsize: 241668096
get_mm_rss: 326
rsslim: 18446744073709551615 / 0xffffffffffffffff
mm_start_code: 139782748139520 / 0x7f21b50c7000
mm_end_code: 139782748261700 / 0x7f21b50e4d44
mm_start_stack: 140737353849984 / 0x7ffff7fb9c80
esp: 140737353844496 / 0x7ffff7fb8710
eip: 139782734487251 / 0x7f21b43c1ed3
obsolete-pending-signals: 0 / 0x0
obsolete-blocked-signals: 0 / 0x0
obsolete-sigign: 3674112 / 0x381000
obsolete-sigcatch: 16390 / 0x4006
wchan: 18446744073709551615 / 0xffffffffffffffff
always-zero: 0
always-zero: 0
task_exit_signal: 17
task_cpu: 1
task_rt_priority: 0
task_policy: 0
delayacct_blkio_ticks: 0
gtime: 0 clocks (= 0 clocks) (= 0.000000 s)
cgtime: 0 clocks (= 0 clocks) (= 0.000000 s)