[user@centos-vm-02 ~]$ ps aux|grep python
user 4182 0.0 0.0 9228 1080 ? Ss 02:00 0:00 /bin/sh -c cd data/trandata && /usr/local/bin/python2.7 main.py >> /dev/null 2>&1
user 4190 0.1 0.1 341108 10740 ? Sl 02:00 0:52 /usr/local/bin/python2.7 main.py
user 4205 166 1.6 1175176 129312 ? Sl 02:00 901:39 /usr/local/bin/python2.7 main.py
user 10049 0.1 0.1 435856 10712 ? Sl 10:21 0:04 /usr/local/bin/python2.7 main.py
user 10051 71.1 2.5 948248 207628 ? Sl 10:21 28:42 /usr/local/bin/python2.7 main.py
user 10052 51.9 1.9 948380 154688 ? Sl 10:21 20:57 /usr/local/bin/python2.7 main.py
user 10053 85.9 0.9 815104 76652 ? Sl 10:21 34:41 /usr/local/bin/python2.7 main.py
user 11166 0.0 0.0 103240 864 pts/1 S+ 11:01 0:00 grep python
[user@centos-vm-02 ~]$ ps -ef|grep python
user 4182 4174 0 02:00 ? 00:00:00 /bin/sh -c cd /data/trandata && /usr/local/bin/python2.7 main.py >> /dev/null 2>&1
user 4190 4182 0 02:00 ? 00:00:52 /usr/local/bin/python2.7 main.py
user 4205 4190 99 02:00 ? 15:01:46 /usr/local/bin/python2.7 main.py
user 10049 1 0 10:21 ? 00:00:04 /usr/local/bin/python2.7 main.py
user 10051 10049 71 10:21 ? 00:28:47 /usr/local/bin/python2.7 main.py
user 10052 10049 51 10:21 ? 00:21:01 /usr/local/bin/python2.7 main.py
user 10053 10049 85 10:21 ? 00:34:45 /usr/local/bin/python2.7 main.py
user 11168 10904 0 11:01 pts/1 00:00:00 grep python
As we see, I launch a python process that it would spwan multiprocess, and inside the processes, multithreads are started, and inside the threads, multithreads are started.
Process tree like this:
main_process
--sub_process
----thread1
------sub_thread
------sub_thread
------sub_thread
------sub_thread
----thread2
----thread3
--sub_process
----......
Inside the picture, the pid-4205 shows different CPU usage in ps aux and ps -ef, one is 166, and the other is 99, 166 was also shown in top -c.
And I assure that the pid-4205 is one of the sub processes, which means it could not use more than 100% of CPU with GIL in python.
So that's my question, why ps -ef and ps aux show difference.
It's just a sampling artifact. Say a factory produces one car per hour. If you get there right before a car is made and leave right after a car is made, you can see two cars made in a span of time just over an hour, resulting in you thinking the factory is operating at near double capacity.
Update: Let me try to clarify the example. Say a factory produces one car per hour, on the hour. It is incapable of producing more than one car per hour. If you watch the factory from 7:59 to 9:01, you will see two cars produced (one at 8:00 and one at 9:00) in just over one hours (62 minutes). So you would estimate the factory produces about two cars per hour, nearly double its actual production. That is what happened here. It's a sampling artifact caused by top
checking the CPU counters at just the wrong time.