c++c performance bit-manipulation operation

Fastest way to split a word into two bytes

So what is the fastest way to split a word into two bytes ?

short s = 0x3210;
char c1 = s >> 8;
char c2 = s & 0x00ff;

versus

short s = 0x3210;
char c1 = s >> 8;
char c2 = (s << 8) >> 8;

Edit

How about

short s = 0x3210;
char* c = (char*)&s; // where c1 = c[0] and c2 = c[1]

Solution

I'm 99.9% sure the first one is at least as fast as the second in nearly all architectures. There may be some architectures where it makes no difference (they are equal), and in several architectures, the latter will be slower.

The main reason I'd say the second is slower is that there are two shifts to come up with the c2 number. The processor can't start to process the second shift until it has done the first shift.

Also, the compiler may well be able to do other clever stuff with the first one (if there are instructions to do that - for example an x86 processor can load s into AX, and store AL into c1 and AH into c2 - no extra instructions beyond the store operation), where the second one is much less likely to be a "known common pattern" (I certainly have never seen that variant being used in code, where the shift/and method is very commonly used - often in "pixel loops", meaning it's critical to implement good optimisation for it).

As always, measure, measure and measure again. And unless you are ONLY interested in your particular machines performance, try it on different models/manufacturers of processors, so you don't make something that is 5% faster on your model of machine, but 20% slower on another model.