Search code examples
performancememoryrtos

RTOS: Context Switching - Calculate TCB Lookup vs. Memory Access


Currently i am writing a little lightweight RTOS for the STM32F4VE with an Cortex-M4 processor. The context switching in round robin manner between several processes works fine, but it is a hobby of mine to optimize the kernel mechanisms as much as possible. The TCB's are stacked ontop at the bottom of the SRAM in a reserved area.

On each context switch i search the next TCB like this: ((pid + 1) * TCB_Size) + TCB_BASE_ADRESS .

If the pid is equal the amount of tasks I reset it to 0. This is done with if ternary operator instead of the expensive modulo operation.

When switching e.g. every 8ms, the CPU has to do this multiplication every time. I wonder if it would be more effective to pre-calculate these tcb adresses when spawning each task and write them directly to memory.

The second variant would be accessing memory each context switch 2 times - fetch address of tcb, fetch tcb. Otherwise there is a "expensive" multiplication.

Which variant would be more effective? If nobody has an absolute answer i will rewrite the concept and do a simple benchmark.

Thanks!


Solution

  • On each context switch i search the next TCB like this: ((pid + 1) * TCB_Size) + TCB_BASE_ADRESS .

    Just use a circular linked list; like:

        pointer_to_next_task_TCB = pointer_to_this_task_TCB->next;
        switch_to_task(pointer_to_next_task_TCB);
    

    ... where pointer_to_this_task_TCB is a global variable (single-CPU only) or a CPU specific variable; and where the switch_to_task(pointer_to_next_task_TCB); ensures that pointer_to_this_task_TCB = pointer_to_next_task_TCB; is done as part of (or immediately after) the task switch.

    Note that when tasks block (sleep, waiting for disk IO, waiting to acquire a mutex, ...) you need to remove them from the linked list to ensure they're not given CPU time by the scheduler and then do a task switch ASAP (without wasting CPU time until the task's time slice ends); and when tasks unblock (time delay expires, data arrives from disk, ...) they need to be inserted back into the linked list so that the scheduler will give them CPU time again, and they need to be inserted at the right place (the current end of the list) to prevent denial of service/CPU hogs (e.g. tasks deliberately blocking for an extremely tiny amount of time to continually get put back at the start of the list and get a whole new time slice while other tasks get no CPU time).

    Don't forget that under normal conditions most tasks are blocked most of the time (and their TCBs are not on the scheduler's linked list most of the time); and the list is almost never in order of PID.

    For example; if there's 100 tasks where 96 are blocked waiting for something, then the scheduler's linked list might be "PID 9, PID 74, PID 31, PID 46, then back to PID 9 again".