How to convert little endian PCM audio samples to big endian

I need to fill a datagram buffer with network byte ordered audio samples per the apt-x RTP layout per RFC 7310 section 5.5:

This is how the layout of audio samples sits in my application memory:

24 bit packed little endian samples.

My application buffer in this case is also compacted (audio byte layout - little endian as I am on a PC): note I reversed the LSB & MSB order below comparing it to the layout shown at the end of section 5.5 in the RFC as the RFC shows big endian samples) - ignore the last byte here, as in compacted form, this blank byte would be the LSB for next audio sample etc.)

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      LSB      |       MB      |      MSB      |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   little endian memory layout for 24 bit signed pcm sample.

I need to memcpy my application buffer to the destination datagram buffer (starting at offset 12, as I already have a correctly formatted big endian RTP header already at the start of the buffer).

Should I std::reverse my application buffer and then call memcpy or do I need some special endian library call that understandas the layout of the 24 bit signed samples? I belive the nomenclature of these samples is spcm24-le (signed pcm 24 bit little endian).

Imagine the audio samples as an array std::uint32_t[number of samples] of audio in intel memory (once they arrive from the microphone). In order to convert these to spcm24-le, I can simply shift each of these samples right by 8 bits to convert each sample to a signed pcm24 (still little endian) - but I don't know how to insert this into the target datagram buffer nicely packed and organized per the RFC.

A simple example in c++ would be great - perhaps using std::endian or std::reverse of std::byteswap ??

   For the example format, the diagram below shows how coded samples
   from each channel are packed into a sample block and how sample
   blocks 1, 2, and 48 are subsequently packed into the RTP packet.

      C:
      Channel index: Left (l) = 1, left center (lc) = 2,
      center (c) = 3, right (r) = 4, right center (rc) = 5,
      and surround (S) = 6.

      T:
      Sample Block time index: The first sample block is 1; the final
      sample is 48.

      S(C)(T):
      The Tth sample from channel C.

Lindsay & Foerster           Standards Track                    [Page 8]
RFC 7310                    apt-X RTP Format                   July 2014

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    S(1)(1)                    |    S(2)(1)    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    S(2)(1)    |            S(3)(1)            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    S(3)(1)    |                   S(4)(1)                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    S(5)(1)                    |    S(6)(1)    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    S(6)(1)    |            S(1)(2)            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    S(2)(2)    |                   S(3)(2)                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    S(4)(2)                    |    S(5)(2)    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    S(5)(2)    |            S(6)(2)            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    S(6)(2)    |                   S(1)(3)                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |            S(6)(47)           |            S(1)(48)           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    S(1)(48)   |                   S(2)(48)                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    S(3)(48)                   |    S(4)(48)   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                   S(4)(48)    |           S(5)(48)            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    S(5)(48)   |                   S(6)(48)                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   For the example format, the diagram below indicates the order that
   coded bytes are packed into the packet payload in terms of sample
   byte significance.  The following abbreviations are used.

      MSB:
      Most Significant Byte of a 24-bit coded sample

      MB:
      Middle Byte of a 24-bit coded sample

      LSB:
      Least Significant Byte of a 24-bit coded sample

Lindsay & Foerster           Standards Track                    [Page 9]
RFC 7310                    apt-X RTP Format                   July 2014

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      MSB      |       MB      |      LSB      |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Solution

I would use C++23 for readability and enhanced safety. The logic can then be downgraded for older standards:

#include <bit>
#include <ranges>
void in_place_reorder(std::span<std::byte> buffer){ 
    namespace rv = std::views;
    if (std::endian::native == std::endian::big)
        for(auto && [first, _ , last]: buffer | rv::adjacent<3> | rv::stride(3))
            std::swap(first,last);
};

std::vector<std::byte> in_vec{/*...*/};
std::array<std::byte, N> in_arr{/*...*/};
in_place_reorder(in_vec);
in_place_reorder(in_arr);

If the source is supposed to stay intact I would just create an adapter:

namespace rv = std::views;
auto constexpr reorder = rv::chunk(3) | rv::transform(rv::reverse) | join;

std::vector<std::byte> input{/*...*/};
auto output = input | reorder | std::ranges::to<std::vector>();

std::vector<std::byte> output2(size(input));
std::ranges::copy(input | reorder, begin(output2);

If C++23 is not available equivalent can be done with ranges-v3 open source library. Last resort would be to convert this logic to old-school for loops with great caution to avoid common bugs:

void in_place_reorder(std::span<std::byte> buffer){ 
    if (std::endian::native == std::endian::big)
        for(std::size_t i = 0; (i + 3) < size(buffer); i += 3)
            std::swap(buffer[i],buffer[i+2]);
};

std::vector<std::byte> in_vec{/*...*/};
auto out_vec = in_vec;
in_place_reorder(out_vec);

The alternative that keeps source intact would be difficult to read and error-prone. Possible to write, but with extra caution and lots of tweaks. It would be a bit faster than above sample, but trickier. I will skip that part because of my obsession for good design that complicates things.