I'm looking for an efficient way to bit shift left (<<
) 10 bit values that are stored within a byte array using C++/Win32.
I am receiving an uncompressed 4:2:2 10 bit video stream via UDP, the data is stored within an unsigned char array due to the packaging of the bits.
The data is always sent so that groups of pixels finish on a byte boundary (in this case, 4 pixels sampled at a bit-depth of 10 use 5 bytes):
The renderer I am using (Media Foundation Enhanced Video Renderer) requires that 10 bit values are placed into a 16 bit WORD with 6 padding bits to the right, whilst this is annoying I assume it's to help them ensure a 1-byte memory alignment:
What is an efficient way of left shifting each 10 bit value 6 times (and moving to a new array if needed)? Although I will be receiving varying lengths of data, they will always be comprised of these 40 bit blocks.
I'm sure a crude loop would suffice with some bit-masking(?) but that sounds expensive to me and I have to process 1500 packets/second, each with ~1200 bytes of payload.
Edit for clarity
Example Input:
unsigned char byteArray[5] = {0b01110101, 0b01111010, 0b00001010, 0b11111010, 0b00000110}
Desired Output:
WORD wordArray[4] = {0b0111010101000000, 0b1110100000000000, 0b1010111110000000, 0b1000000110000000}
(or the same resulting data in a byte array)
This does the job:
void ProcessPGroup(const uint8_t byteArrayIn[5], uint16_t twoByteArrayOut[4])
{
twoByteArrayOut[0] = (((uint16_t)byteArrayIn[0] & 0b11111111u) << (0 + 8)) | (((uint16_t)byteArrayIn[1] & 0b11000000u) << 0);
twoByteArrayOut[1] = (((uint16_t)byteArrayIn[1] & 0b00111111u) << (2 + 8)) | (((uint16_t)byteArrayIn[2] & 0b11110000u) << 2);
twoByteArrayOut[2] = (((uint16_t)byteArrayIn[2] & 0b00001111u) << (4 + 8)) | (((uint16_t)byteArrayIn[3] & 0b11111100u) << 4);
twoByteArrayOut[3] = (((uint16_t)byteArrayIn[3] & 0b00000011u) << (6 + 8)) | (((uint16_t)byteArrayIn[4] & 0b11111111u) << 6);
}
Don't be confused by the [5]
and [4]
values in the function signature above. They don't do anything except tell you, the user, that that is the mandatory, expected number of elements in each array. See my answer here on this: Passing an array as an argument to a function in C. Passing an array that is shorter will result in undefined behavior and is a bug!
Full test code (download it in my eRCaGuy_hello_world repo here: cpp/process_10_bit_video_data.cpp):
test.cpp
/*
GS
17 Mar. 2021
To compile and run:
mkdir -p bin && g++ -Wall -Wextra -Werror -ggdb -std=c++17 -o bin/test \
test.cpp && bin/test
*/
#include <bitset>
#include <cstdint>
#include <cstdio>
#include <cstring>
#include <iostream>
// Get the number of elements in any C array
// - Usage example: [my own answer]:
// https://arduino.stackexchange.com/questions/80236/initializing-array-of-structs/80289#80289
#define ARRAY_LEN(array) (sizeof(array)/sizeof(array[0]))
/// \brief Process a packed video P group, which is 4 pixels of 10 bits each (exactly 5 uint8_t
/// bytes) into a uint16_t 4-element array (1 element per pixel).
/// \details Each group of 10-bits for a pixel will be placed into a 16-bit word, with all 10
/// bits left-shifted to the far left edge, leaving 6 empty (zero) bits in the right
/// side of the word.
/// \param[in] byteArrayIn 5 bytes of 10-bit pixel data for exactly 4 pixels; any array size < 5
/// will result in undefined behavior! So, ensure you pass the proper array
/// size in!
/// \param[out] twoByteArrayOut The output array into which the 4 pixels will be packed, 10 bits per
/// 16-bit word, all 10 bits shifted to the left edge; any array size < 4
/// will result in undefined behavior!
/// \return None
void ProcessPGroup(const uint8_t byteArrayIn[5], uint16_t twoByteArrayOut[4])
{
twoByteArrayOut[0] = (((uint16_t)byteArrayIn[0] & 0b11111111u) << (0 + 8)) | (((uint16_t)byteArrayIn[1] & 0b11000000u) << 0);
twoByteArrayOut[1] = (((uint16_t)byteArrayIn[1] & 0b00111111u) << (2 + 8)) | (((uint16_t)byteArrayIn[2] & 0b11110000u) << 2);
twoByteArrayOut[2] = (((uint16_t)byteArrayIn[2] & 0b00001111u) << (4 + 8)) | (((uint16_t)byteArrayIn[3] & 0b11111100u) << 4);
twoByteArrayOut[3] = (((uint16_t)byteArrayIn[3] & 0b00000011u) << (6 + 8)) | (((uint16_t)byteArrayIn[4] & 0b11111111u) << 6);
}
// Reference: https://stackoverflow.com/questions/7349689/how-to-print-using-cout-a-number-in-binary-form/7349767
void PrintArrayAsBinary(const uint16_t* twoByteArray, size_t len)
{
std::cout << "{\n";
for (size_t i = 0; i < len; i++)
{
std::cout << std::bitset<16>(twoByteArray[i]);
if (i < len - 1)
{
std::cout << ",";
}
std::cout << std::endl;
}
std::cout << "}\n";
}
int main()
{
printf("Processing 10-bit video data example\n");
constexpr uint8_t TEST_BYTE_ARRAY_INPUT[5] = {0b01110101, 0b01111010, 0b00001010, 0b11111010, 0b00000110};
constexpr uint16_t TEST_TWO_BYTE_ARRAY_OUTPUT[4] = {
0b0111010101000000, 0b1110100000000000, 0b1010111110000000, 0b1000000110000000};
uint16_t twoByteArrayOut[4];
ProcessPGroup(TEST_BYTE_ARRAY_INPUT, twoByteArrayOut);
if (std::memcmp(twoByteArrayOut, TEST_TWO_BYTE_ARRAY_OUTPUT, sizeof(TEST_TWO_BYTE_ARRAY_OUTPUT)) == 0)
{
printf("TEST PASSED!\n");
}
else
{
printf("TEST ==FAILED!==\n");
std::cout << "expected = \n";
PrintArrayAsBinary(TEST_TWO_BYTE_ARRAY_OUTPUT, ARRAY_LEN(TEST_TWO_BYTE_ARRAY_OUTPUT));
std::cout << "actual = \n";
PrintArrayAsBinary(twoByteArrayOut, ARRAY_LEN(twoByteArrayOut));
}
return 0;
}
Sample run and output:
$ mkdir -p bin && g++ -Wall -Wextra -Werror -ggdb -std=c++17 \
-o bin/test test.cpp && bin/test
Processing 10-bit video data example
TEST PASSED!
I've now also placed this code into my eRCaGuy_hello_world repo here: cpp/process_10_bit_video_data.cpp.
ARRAY_LEN()
macro: see utilities.hKeywords: c and c++ bitmasking and bit-shifting, bit-packing; bit-masking bit masking, bitshifting bit shifting, bitpacking bit packing, byte packing, lossless data compression