Search code examples
armsimdendiannesscpu-wordneon

How to swap the byte order for individual words in a vector in ARM/ACLE


I usually write portable C code and try to adhere to strictly standard-conforming subset of the features supported by compilers.

However, I'm writing codes that exploits the ARM v8 Cryptography extensions to implement SHA-1 (and SHA-256 some days later). A problem that I face, is that, FIPS-180 specify the hash algorithms using big-endian byte order, whereas most ARM-based OS ABIs are little-endian.

If it's a single integer operand (on general purpose register) I can use the APIs specified for the next POSIX standard, but I'm working with SIMD registers, since it's where ARMv8 Crypto works.

So Q: how do I swap the byte order for words in a vector register on ARM? I'm fine with assembly answers, but prefer ACLE intrinsics ones.


Solution

  • The instructions are:

    • REV16 for byte-swapping short integers,
    • REV32 for byte-swapping 32-bit integers, and
    • REV64 for byte-swapping 64-bit integers.

    They can be used to swap the byte AND word order of any length that's strictly less than what their name indicates. They're defined in section C7.2.219~C7.2.221 of Arm® Architecture Reference Manual Armv8, for A-profile architecture "DDI0487G_b_armv8_arm.pdf"

    e.g. REV32 can be used to reverse the order of 2 short integers within each 32-bit words:

    [00][01][02][03][04][05][06][07]
    to
    [02][03][00][01][06][07][04][05]
    

    Their intrinsics are defined in a separate document: Arm Neon Intrinsics Reference "advsimd-2021Q2.pdf"

    To swap the 32-bit words in a 128-bit vector, use the vrev32q_u8 instrinsic. Relevant vreinterpretq_* intrinsics need to be used to re-interpret the type of the operands.