I'm experimenting with the realtime capabilities of the Raspberry Pi 3/4. I've written the following C++ program to test.
// Compile with:
// g++ realtime_task.cpp -o realtime_task -lrt && sudo setcap CAP_SYS_NICE+ep realtime_task
#include <cstdio>
#include <sched.h>
#include <unistd.h>
#include <fcntl.h>
#include <cstdbool>
#include <chrono>
#include <algorithm>
using namespace std;
using namespace chrono;
using namespace chrono_literals;
int main(int argc, char **argv)
{
// allocate this process to the 4th core (core 3)
pid_t pid = getpid();
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(3, &cpuset);
int result = 0;
result = sched_setaffinity(pid, sizeof(cpu_set_t), &cpuset);
if (result != 0)
{
perror("`sched_setaffinity` failed");
}
struct sched_param param;
// Use SCHED_FIFO
param.sched_priority = 99;
result = sched_setscheduler(pid, SCHED_FIFO, ¶m);
// Use SCHED_OTHER
// param.sched_priority = 0;
// result = sched_setscheduler(pid, SCHED_OTHER, ¶m);
if (result != 0)
{
perror("`sched_setscheduler` failed");
}
uint32_t count = 0;
uint32_t total_loop_time_us = 0;
uint32_t min_loop_time_us = numeric_limits<uint32_t>::max();
uint32_t avg_loop_time_us = 0;
uint32_t max_loop_time_us = 0;
while(true)
{
count++;
auto start = steady_clock::now();
auto loop_time_us = duration_cast<microseconds>(steady_clock::now() - start).count();
while(loop_time_us < 500)
{
loop_time_us = duration_cast<microseconds>(steady_clock::now() - start).count();
}
min_loop_time_us = min((uint32_t)loop_time_us, (uint32_t)min_loop_time_us);
total_loop_time_us += loop_time_us;
avg_loop_time_us = total_loop_time_us / count;
max_loop_time_us = max((uint32_t)loop_time_us, (uint32_t)max_loop_time_us);
if ((count % 1000) == 0)
{
printf("%u %u %u\r", min_loop_time_us, avg_loop_time_us, max_loop_time_us);
fflush(stdout);
}
}
return 0;
}
I've patched the kernel with the PREEMPT_RT patch. uname
reports 5.15.84-v8+ #1613 SMP PREEMPT
and everything runs fine.
Kernel command line arguments have isolcpus=3 irqaffinity=0-2
to isolate the 4th core (core 3) and reserve it for the program above. I can see in htop that my program is the only process running on the 4th core (core 3).
When using the SCHED_FIFO
policy, it reports the following minimum, average, and maximum loop times...
MIN AVG MAX
500 522 50042
.. and htop reports:
CPU▽ CTXT PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
3 1 37238 pi RT 0 4672 1264 1108 R 97.5 0.0 0:07.57 ./realtime_task
When using the SCHED_OTHER
policy, it reports the following minimum, average, and maximum loop times...
MIN AVG MAX
500 500 524
.. and htop reports:
CPU▽ CTXT PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
3 0 36065 pi 20 0 4672 1260 1108 R 100. 0.0 1:30.16 ./realtime_task
This is the opposite of what I expect. I expect SCHED_FIFO
to give me the lower maximum loop time and fewer context switches. Why am I getting these results?
The problem turned out to be realtime throttling. When throttling occurs a message appears in the dmesg output.
Once disabled with echo -1 > /proc/sys/kernel/sched_rt_runtime_us
, the SCHED_FIFO
policy worked as expected. When a stressor program is introduced on cores 0~2, then SCHED_FIFO
performs much better than SCHED_OTHER
.
However, there is a better way to avoid realtime throttling without having to disable it for the entire system. For the program listing in the original question change this code ...
auto loop_time_us = duration_cast<microseconds>(steady_clock::now() - start).count();
while(loop_time_us < 500)
{
loop_time_us = duration_cast<microseconds>(steady_clock::now() - start).count();
}
... to this code ...
this_thread::sleep_for(400us);
auto loop_time_us = duration_cast<microseconds>(steady_clock::now() - start).count();
while(loop_time_us < 500)
{
loop_time_us = duration_cast<microseconds>(steady_clock::now() - start).count();
}
The this_thread::sleep_for
call will prevent the process from consuming the entire allotted time in /proc/sys/kernel/sched_rt_period_us
and thus prevent realtime throttling. Since sleep_for
is not very precise, you just don't sleep for the full 500 microseconds, and use the while(loop_time_us < 500)
loop to fill the remaining 100 microseconds with a more precise spin-wait.
This method also prevents the realtime core from turning into a space heater.