c++performance type-conversion cpu-architecture unsigned-integer

16 to 32 bit integer conversion vs performance

I want to load 16 bit unsigned integers from an array and use these values for 32 bit unsigned calculations in C++. I have the choice between storing the values as 16 bit array (less memory) or 32 bit array (more memory consumption).

My code should be compilable with common C++ compilers and run on as many architectures as possible. It will be difficult to do performance measurement and assembler code reading for many of these combinations, so I am asking for a theoretical examination.

In other words: Under which conditions does a 16 bit to 32 bit unsigned integer conversion usually consume CPU cycles? When can I expect to use the memory reduced 16 bit array without loosing CPU cycles?

Solution

I think all major architectures support loads from memory with sign extension and zero extension. x86, ARM and MIPS definitely do have such load instructions. Old architectures and primitive microcontrollers, especially 8-bit and 16-bit ones, may not have such instructions and therefore may require multiple instructions to achieve the same result. If you aren't mentioning those, you probably don't really care. So, just write portable C/C++ code and be done with it.