Pack/unpack short into int

I want to pack/unpack two signed 16 bit integers into a 32 bit integer. However, I'm not getting it to quite work.

Any ideas as to what I might be doing wrong?

template <typename T>
int read_s16(T& arr, int idx) restrict(amp)
{
    return static_cast<int>((arr[idx/2] >> ((idx % 2) * 16)) << 16) >> 16;
}

template<typename T>
void write_s16(T& arr, int idx, int val) restrict(amp)
{
    // NOTE: arr is zero initialized
    concurrency::atomic_fetch_or(&arr[idx/2], (static_cast<unsigned int>(val) & 0xFFFF) << ((idx % 2) * 16));
}

The function return/arguments must be as I have defined. The lo and hi are written from different threads (thus the atomic_or), and the read must return a single 32 bit value.

16 bit integer arithmetics are not supported on the target platform.

Example:

array<int> ar(1); // Container

write_s16(ar, 0, -16);
write_s16(ar, 1, 5);

assert(read_s16(ar, 0) == -16);
assert(read_s16(ar, 1) == 5);

Solution

These atomic operations in C++ AMP also have the following limitations:

You should not mix atomic and normal (non-atomic) reads and writes. Normal reads may not see the results of atomic writes to the same memory location. Normal writes should not be mixed with atomic writes to the same memory location. If your program does not comply with these criteria then this will lead to an undefined result.
Atomic operations do not imply a memory fence of any sort. Atomic operations may be reor-dered. This differs from the behavior of interlocked operations in C++.

It would seem like you are violating the first of these.