Testing whether niceness is being properly applied

I am trying to prioritize certain processes over others. Here's the main script I'm using, mocking a CPU-intensive process:

simple_app.py

import os
from multiprocessing import Pool, cpu_count

def f(x):
    while True:
        x*x

if __name__ == '__main__':

    cpu = cpu_count()
    pid = os.getpid()

    print('-' * 20)
    print('pid: {}'.format(pid))
    print('Utilizing {} cores'.format(cpu))
    print('Current niceness: {}'.format(os.nice(0)))
    print('-' * 20)

    pool = Pool(cpu)
    pool.map(f, range(cpu))

My next step is to spawn a lot (to be specific, 9, in this case) of processes that run this code:

simple_runner.sh

# Start with lowest priority
nice -19 python3 simple_app.py &
# Much higher priority
nice -0 python3 simple_app.py &
# Lower priority spawned
nice -10 python3 simple_app.py &
# Higher priority again
nice -7 python3 simple_app.py &
# Highest priority yet
nice -1 python3 simple_app.py &
# Highest priority yet
nice -0 python3 simple_app.py &
# Highest priority yet
nice -0 python3 simple_app.py &
# Highest priority yet
nice -0 python3 simple_app.py &
# Highest priority yet
nice -0 python3 simple_app.py

I then monitor each process, reporting on child CPU utilization, here:

process_reporting_server.py

import os
import time
import argparse
import pprint
from multiprocessing import Pool, cpu_count

import psutil

def most_recent_process_info(pid, interval=0.5):
    while True:
        proc = psutil.Process(pid)
        children_cpu_percent = [child.cpu_percent(interval) for child in proc.children()]
        children_cpu_percent_mean = sum(children_cpu_percent) / len(children_cpu_percent) if children_cpu_percent else -1.
        print('Time: {}, PID: {}, niceness: {}, average child CPU percent: {:.2f}'.format(
            time.ctime(),
            pid,
            proc.nice(),
            children_cpu_percent_mean)
        )

if __name__ == '__main__':

    parser = argparse.ArgumentParser()
    parser.add_argument('-p', '--pids', type=str, help='Whitespace-delimited string containing PIDs', dest='pids')
    parser.add_argument('-s', '--seconds', type=int, help='Seconds to sleep', default=10, dest='seconds')
    args = parser.parse_args()

    pids = list(map(int, args.pids.split()))

    pool = Pool(len(pids))
    pool.map(most_recent_process_info, pids)

I want to see whether processes given a lower niceness value are actually being prioritized. So here's what I do:

Run simple_app_runner.sh:

$ ./simple_app_runner.sh 
--------------------
pid: 45036
Utilizing 8 cores
Current niceness: 0
--------------------
--------------------
pid: 45030
Utilizing 8 cores
Current niceness: 19
--------------------
--------------------
pid: 45034
Utilizing 8 cores
Current niceness: 1
--------------------
--------------------
pid: 45032
Utilizing 8 cores
Current niceness: 10
--------------------
--------------------
pid: 45033
Utilizing 8 cores
Current niceness: 7
--------------------
--------------------
pid: 45037
Utilizing 8 cores
Current niceness: 0
--------------------
--------------------
pid: 45038
Utilizing 8 cores
Current niceness: 0
--------------------
--------------------
pid: 45031
Utilizing 8 cores
Current niceness: 0
--------------------
--------------------
pid: 45035
Utilizing 8 cores
Current niceness: 0
--------------------

Then, here's the report:

$ python3 process_reporting_server.py -p '45036 45030 45034 45032 45033 45037 45038 45031 45035'

Cleaning things up a bit and analyzing with pandas, we see that over a five-minute interval, the specified niceness doesn't seem to matter:

>>> df.groupby('nice')['mean_child_cpu'].max()
nice
0.0     10.50
1.0      9.75
7.0      8.28
10.0     8.50
19.0    21.97

Am I missing something completely here? Why does the niceness I specify not seem to affect the prioritization of CPU resources?

Solution

I don't think you are missing anything. My experience is that the top dog process gets first priority and everyone else fights for what's left. You'd probably get the same results (for a pure cpu bound process like this) if only reniced one process to -1 and left the rest at -0.

And, the reason for this is because people usually really don't want the prioritization to be as hardcore as we sometimes expect it to be. Like right now I'm posting this while my load average is over 200 with a bunch of reniced (higher priority) processes going. If all those processes were truly hogs then it wouldn't be "nice." I like that I can still use my browser with all that cpu load going on.

At one point I was under the impression that you could alter the priority queueing, at least on some unixes. I vaguely remember having some customers of mine demanding we do just that, us (the sysadmin team) saying "not a good idea", the customer demanding we do it, we do it", and then customer demanding we undo it. Scheduling is tricky business.

Here's an intro to what's going on beneath the covers: http://www.cs.montana.edu/~chandrima.sarkar/AdvancedOS/SchedulingLinux/index.html Pay particular attention to the bottom section - "The algorithm does not scale well," which goes hand-in-hand with my first paragraph.