Search code examples
cembeddedtransposebitarraydspic

What is the fastest way to transpose the bits in an 8x8 block on bits?


I'm not sure the exact term for what I'm trying to do. I have an 8x8 block of bits stored in 8 bytes, each byte stores one row. When I'm finished, I'd like each byte to store one column.

For example, when I'm finished:

Byte0out = Byte0inBit0 + Bit0inByte1 + Bit0inByte2 + Bit0inByte3 + ...
Byte1out = Bit1inByte0 + Bit1inByte1 + Bit1inByte2 + Bit1inByte3 + ...

What is the easiest way to do this in C which performs well? This will run on a dsPIC microcontroller


Solution

  • This code is cribbed directly from "Hacker's Delight" - Figure 7-2 Transposing an 8x8-bit matrix, I take no credit for it:

    void transpose8(unsigned char A[8], int m, int n, 
                    unsigned char B[8]) {
       unsigned x, y, t; 
    
       // Load the array and pack it into x and y. 
    
       x = (A[0]<<24)   | (A[m]<<16)   | (A[2*m]<<8) | A[3*m]; 
       y = (A[4*m]<<24) | (A[5*m]<<16) | (A[6*m]<<8) | A[7*m]; 
    
       t = (x ^ (x >> 7)) & 0x00AA00AA;  x = x ^ t ^ (t << 7); 
       t = (y ^ (y >> 7)) & 0x00AA00AA;  y = y ^ t ^ (t << 7); 
    
       t = (x ^ (x >>14)) & 0x0000CCCC;  x = x ^ t ^ (t <<14); 
       t = (y ^ (y >>14)) & 0x0000CCCC;  y = y ^ t ^ (t <<14); 
    
       t = (x & 0xF0F0F0F0) | ((y >> 4) & 0x0F0F0F0F); 
       y = ((x << 4) & 0xF0F0F0F0) | (y & 0x0F0F0F0F); 
       x = t; 
    
       B[0]=x>>24;    B[n]=x>>16;    B[2*n]=x>>8;  B[3*n]=x; 
       B[4*n]=y>>24;  B[5*n]=y>>16;  B[6*n]=y>>8;  B[7*n]=y; 
    }
    

    I didn't check if this rotates in the direction you need, if not you might need to adjust the code.

    Also, keep in mind datatypes & sizes - int & unsigned (int) might not be 32 bits on your platform.

    BTW, I suspect the book (Hacker's Delight) is essential for the kind of work you're doing... check it out, lots of great stuff in there.