Search code examples
cmemcpy

The fastest way to copy a 32bits array into 16bits arrays?


What is the best way to copy a 32bits array into 16bits arrays?

I know that "memcpy" uses hardware instruction.But is there a standard function to copy arrays with "changing size" in each element?

I use gcc for armv7 (cortex A8).

uint32_t tab32[500];
uint16_t tab16[500];
for(int i=0;i<500;i++)
    tab16[i]=tab32[i];

Solution

  • On ARM cortex A8 with Neon instruction set, the fastest methods use interleaved read/write instructions:

    vld2.16 {d0,d1}, [r0]!
    vst1.16 {d0}, [r1]!
    

    or saturating instructions to convert a vector of 32-bit integers to a vector of 16-bit integers.

    Both of these methods are available in c using gcc intrinsic. It's also possible that gcc can autovectorize a carefully written c-code to use nothing but these particular instructions. This would basically require that there's a one to one correspondence with all the side effects of these instructions and the c code.