c atomic inline-assembly compare-and-swap

How to implement atomic compare and swap using 'cmpxchg8b' assembly instruction in c?

I am trying to implement a lock free linked list. For that I need to implement atomic compare and swap instruction using inline assembly in C. I do know that I need to use the cmpxchg8b instruction for x86, however I am not able to do it.

My node structure is as follows:

typedef struct node
{
    int data;
    struct node * next;
    struct node * backlink;
}node_lf;

The struct node * next pointer also holds additional information in the last 2 bits (mark and flag bits)

The compare and swap I need to implement is like this:

node_lf *cs (cs_arg * address, cs_arg *old_val, cs_arg *new_val )
{
    node_lf *value = (address->node)->next;
    if ((address->node->next) == old_val->node)
    {
        (address->node)->next = new_val->node;
    }
    return value;
}

The cs_arg struct is as follows:

typedef struct csArg
{
    node_lf * node;
}cs_arg;

My implementation:

static inline node_lf* cs(cs_arg * address, cs_arg *old_val, cs_arg *new_val)
{
    node_lf * value;
    __asm__ __volatile__("lock cmpxchg8b %0; "
                :"=q"(value)
                :"a"(address->node->next), "b"(old_val->node), "c"(new_val->node)
                :"memory");
    return value;
}

This gives an error message: ../list.c:86: Error: operand type mismatch for 'cmpxchg8b' make: *** [list.o] Error 1

Please help me to solve this problem.

Solution

Let's take this one step at a time. First of all, let's trying using __sync_val_compare_and_swap in a simple case:

#include <stdio.h>

typedef struct node
 {
    int data;
    struct node * next;
    struct node * backlink;
 }node_lf;

int cs1(node_lf *address, int old_val, int new_val)
{
    int *p2 = &address->data;
    int p3 = __sync_val_compare_and_swap(p2, old_val, new_val);  

    return p3;
}

int main()
{
   node_lf n;

   n.data = 17;

   int res = cs1(&n, 1, 2);
   printf("Old Value: %d  Cur Value: %d\n", res, n.data);

   res = cs1(&n, 17, 2);
   printf("Old Value: %d  Cur Value: %d\n", res, n.data);
}

So, the code here is pretty simple. We are manipulating node.data. We start by initializing it to 17. Then we try to do a cas to change it to 2. However, in the first call, we give the incorrect value for oldval (1 vs 17). As a result, __sync_val_compare_and_swap (correctly!) does not perform the swap, but does return the current value. The second call, which does give the correct old value, does perform the swap and returns the old value.

So, when run we get:

Old Value: 17  Cur Value: 17 <- Didn't change the value
Old Value: 17  Cur Value: 2  <- Did change the value

Simple enough, but not quite what you are looking for. Let's bring it a bit closer:

#include <stdio.h>

typedef struct node
 {
    int data;
    struct node * next;
    struct node * backlink;
 }node_lf;

typedef struct csArg
{
    node_lf * node;
}cs_arg;

node_lf *cs2(node_lf *address, const cs_arg *old_val, const cs_arg *new_val)
{
    unsigned long long *p2 = (unsigned long long *)&address->next;
    unsigned long long p3 = __sync_val_compare_and_swap (p2, (unsigned long long)old_val->node, (unsigned long long)new_val->node);  

    return (node *)p3;
}

int main()
{
   node_lf n;
   cs_arg oldval, newval;

   n.next = (node *)18;
   oldval.node = (node *)1;
   newval.node = (node *)2;

   node *res2 = cs2(&n, &oldval, &newval);
   printf("Old Value: %p  Cur Value: %p\n", res2, n.next);

   oldval.node = (node *)18;
   res2 = cs2(&n, &oldval, &newval);
   printf("Old Value: %p  Cur Value: %p\n", res2, n.next);
}

In this case, we are trying to update node.next, by using the nodes in 2 cs_args. The main thing to note here is that since __sync_val_compare_and_swap works on integral types, we need to cast the pointers to be an appropriately-sized int. The assumption here is we are using a 64bit compiler, so pointers are 8 bytes (the same as unsigned long long).

There isn't really any need to use a cs_arg structure here. Using node* should work fine. But you used it, so presumably there is some long-term plan here.

Running this we see:

Old Value: 0000000000000012  Cur Value: 0000000000000012  <- Didn't change the value
Old Value: 0000000000000012  Cur Value: 0000000000000002  <- Did change the value

Note that it is possible to bump this up to 16 bytes (using __int128 and cmpxchg16b) if you are running x64. However, there are some tricks to be aware of. For example, the data must be 16byte aligned (which node.next is NOT).

Now, having done all that, your requirement that you "need the old value of address->node->next as a return value" seems suspect. Looking at the code above, how can you tell if the cas failed? If it works it returns 18, and if it fails it returns 18. Since that's what you said you wanted, that's what I tried to provide.

But I assume you do want to know if it worked (possibly in order to try again). Such being the case, perhaps something more like this:

#include <stdio.h>

typedef struct node
 {
    int data;
    struct node * next;
    struct node * backlink;
 }node_lf;

bool cs2(node_lf *address, const node *old_val, const node *new_val)
{
    unsigned long long *p2 = (unsigned long long *)&address->next;
    return __sync_bool_compare_and_swap (p2, (unsigned long long)old_val, (unsigned long long)new_val);
}

int main()
{
   node_lf n;
   node *oldval, *newval;

   n.next = (node *)18;
   oldval = (node *)1;
   newval = (node *)2;

   bool res2;

   do {

      printf("Trying to change %p from %p to %p: ", n.next, oldval, newval);
      res2 = cs2(&n, oldval, newval);
      printf("Worked: %d  Cur Value: %p\n", res2, n.next);
      if (res2)
         break;

      oldval = n.next;
   } while (1);
}

When you exit the loop, oldval will be what was there before (has to be or the cas would have failed), newval will be what actually got written. Note that if this truly were multi-threaded, there is no guarantee that newval would be the same as n.next, since another thread could already have come along and changed it again.

While using assembler may be able to save you an instruction or two, the win in terms of readability, maintainability, portability, etc is almost certainly worth the cost.

Unless this is a class project where the teacher requires inline asm. In that case, look at https://stackoverflow.com/a/37825052/2189500.