How to shift 32 bit int by 32 (yet again)

Ok, so I know that typically left- and right- shifts are well defined only for values 0..31. I was thinking how to best extend this to include 32, which simplifies some algorithms. I came up with:

 int32 << n & (n-32) >> 5

Which seems to work. Question is, is it guaranteed to work on any architecture (C, C++, Java), and can it be done more effectively?

Solution

In Java it's guaranteed to work if those variables are of type int, since >> in Java does an arithmetic right shift and shifting more than 31 also has defined behavior. But beware of operator precedence

int lshift(int x, int n)
{
    return (x << n) & ((n-32) >> 5);
}

This will work for shift count up to 32. But it can be modified to include any int values with shift counts larger than 31 return 0

return (x << n) & ((n-32) >> 31);

However in C and C++ the size of int type and the behavior of >> operator is implementation defined. Most (if not all) modern implementations implement it as arithmetic shift for signed types though. Besides, the behavior of shifting more than the variable width is undefined. Worse yet, signed overflow invokes UB so even a left shift by 31 is also UB (until C++14). Therefore to get a well defined output you need to

Use a unsigned fixed-width type like uint32_t (so x << 31 isn't UB)
Use a compiler that emits arithmetic right shift instruction for >> and use a signed type for n, or implement the arithmetic shift yourself
Mask the shift amount to limit it to 5 bits for int32_t

The result would be

uint32_t lshift(uint32_t x, int32_t n)
{
    return (x << (n & 0x1F)) & ((n-32) >> 31);
}

If the architecture supports conditional instructions like x86 or ARM then the following way may be faster

return n < 32 ? x << n : 0;

On a 64-bit platform you can made this even simpler by shifting in a 64-bit type and then mask. Some 32-bit platforms like ARM does support shifting by 32 so this method is also efficient

return ((uint64_t)x << (n & 0x3F)) & 0xFFFFFFFFU;

You can see the output assembly here. I don't see how it can be improved further