Search code examples
iosarm64neon

ARM64 Neon - Store one and same uint8x8_t on all uint8x8x4_t


Got one uint8x8_t, eg. [100, 100, 100, 100, 200, 200, 200, 200]

How can that uint8x8_t above be stored on ONE uint8x8x4_t WITH one instruction / intrinsics ?

At the moment, we use

uint8x8x4_t.val[0] = uint8x8_t;
uint8x8x4_t.val[1] = uint8x8_t;
uint8x8x4_t.val[2] = uint8x8_t;
uint8x8x4_t.val[3] = uint8x8_t;

// typedef struct uint8x8x4_t {
//   uint8x8_t val[4];
// } uint8x8x4_t;    

Solution

  • I don't think there is a single instruction which does this for NEON, unless you replicate the input data and then just use a single vld4 ().

    I've not tested it, but my gut feel is that replication is probably not going to be an overall saving as I doubt many CPU caches are going to sustain 64 bytes per clock, and the moves to replicate the copies in registers should be efficient.