Search code examples
cx86synchronizationmemory-barriersbarrier

Test program for CPU out of order effect


I wrote a multi-thread program to demonstrate the out of order effect of Intel processor. The program is attached at the end of this post. The expected result should be that when x is printed out as 42 or 0 by the handler1. However, the actual result is always 42, which means that the out of order effect does not happen.

I compiled the program with the command "gcc -pthread -O0 out-of-order-test.c" I run the compiled program on Ubuntu 12.04 LTS (Linux kernel 3.8.0-29-generic) on Intel IvyBridge processor Intel(R) Xeon(R) CPU E5-1650 v2.

Does anyone know what I should do to see the out of order effect?

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

int f = 0, x = 0;

void* handler1(void *data)
{
    while (f == 0);
    // Memory fence required here
    printf("%d\n", x);
}

void* handler2(void *data)
{
    x = 42;
    // Memory fence required here
    f = 1;
}

int main(int argc, char argv[])
{
    pthread_t tid1, tid2;

    pthread_create(&tid1, NULL, handler1, NULL);
    pthread_create(&tid2, NULL, handler2, NULL);

    sleep(1);
    return 0;
}

Solution

  • PLEASE NOTE: The following only addresses MEMORY reordering. To my knowledge you cannot observe out-of-order execution outside the pipeline, since that would constitute a failure of the CPU to adhere to its interface. (eg: you should tell Intel, it would be a bug). Specifically, there would have to be a failure in the reorder buffer and instruction retirement bookkeeping.

    According to Intel's documentation (specifically Volume 3A, section 8.2.3.4):

    The Intel-64 memory-ordering model allows a load to be reordered with an earlier store to a different location.

    It also specifies (I'm summarizing, but all of this is available in section 8.2 Memory Ordering with examples in 8.2.3) that loads are never reordered with loads, stores are never reordered with stores, and stores and never reordered with earlier loads. This means there are implicit fences (3 of the weak types) between these operations in Intel 64.

    To observe memory reordering, you just need to implement that example with sufficient carefulness to actually observe the effects. Here is a link to a full implementation I did that demonstrates this. (I will follow up with more details in the accompanying post here).

    Essentially the first thread (processor_0 from the example) does this:

        x = 1;
    #if CPU_FENCE
        __cpu_fence();
    #endif
        r1 = y;
    

    inside of a while loop in its own thread (pinned to a CPU using SCHED_FIFO:99).

    The second (observer, in my demo) does this:

        y = 1;
    #if CPU_FENCE
        __cpu_fence();
    #endif
        r2 = x;
    

    also in a while loop in its own thread with the same scheduler settings.

    Reorders are checked for like this (exactly as specified in the example):

    if (r1 == 0 and r2 == 0)
    ++reorders;
    

    With the CPU_FENCE disabled, this is what I see:

    [  0][myles][~/projects/...](master) sudo ./build/ooo
    after 100000 attempts, 754 reorders observed
    

    With the CPU_FENCE enabled (which uses the "heavyweight" mfence instruction) I see:

    [  0][myles][~/projects/...](master) sudo ./build/ooo
    after 100000 attempts, 0 reorders observed
    

    I hope this clarifies things for you!