How do I create a Python bytes object in the C API

I have a Numpy vector of bools and I'm trying to use the C API to get a bytes object as quickly as possible from it. (Ideally, I want to map the binary value of the vector to the bytes object.)

I can read in the vector successfully and I have the data in bool_vec_arr. I thought of creating an int and setting its bits in this way:

PyBytesObject * pbo; 
int byte = 0;
int i = 0;
while ( i < vec->dimensions[0] )  
{
    if ( bool_vec_arr[i] )
    {
        byte |= 1UL << i % 8;
    }
    i++;
    if (i % 8 == 0)
    {
        /* do something here? */
        byte = 0;
    }
}
return PyBuildValue("S", pbo);

But I'm not sure how to use the value of byte in pbo. Does anyone have any suggestions?

Solution

You need to store the byte you've just completed off. Your problem is you haven't made an actual bytes object to populate, so do that. You know how long the result must be (one-eighth the size of the bool vector, rounded up), so use PyBytes_FromStringAndSize to get a bytes object of the correct size, then populate it as you go.

You'd just allocate with:

// Preallocate enough bytes
PyBytesObject *pbo = PyBytes_FromStringAndSize(NULL, (vec->dimensions[0] + 7) / 8);
// Put check for NULL here

// Extract pointer to underlying buffer
char *bytebuffer = PyBytes_AsString(pbo);

where adding 7 then dividing by 8 rounds up to ensure you have enough bytes for all the bits, then assign to the appropriate index when you've finished a byte, e.g.:

if (i % 8 == 0)
{
    bytebuffer[i / 8 - 1] = byte;  // Store completed byte to next index
    byte = 0;
}

If the final byte might be incomplete, you'll need to decide how to handle this (do the pad bits appear on the left or right, is the final byte omitted and therefore you shouldn't round up the allocation, etc.).