Search code examples
pythonparsingpython-3.xbit-manipulationbin

In Python how do I parse the 11th and 12th bit of 3 bytes?


If I have 3 bytes b'\x00\x0c\x00', which can be represented with the bits 00000000 00001100 00000000, how do I then parse the 11th and 12th bit 11 most efficiently?

Here positions:

             **
00000000 11111110 22222111 tens
87654321 65432109 43210987 ones
|||||||| |||||||| ||||||||
00000000 00001100 00000000
             **

I have the following code:

bytes_input = b'\x00\x0c\x00'
for byte in bytes_input:
    print(byte, '{:08b}'.format(byte), bin(byte))
bit_position = 11-1
bits_per_byte = 8
floor = bit_position//bits_per_byte
print('floor', floor)
byte = bytes_input[floor]
print('byte', byte, type(byte))
modulo = bit_position%bits_per_byte
print('modulo', modulo)
bits = bin(byte >> modulo & 3)
print('bits', bits, type(bits))

Which returns:

0 00000000 0b0
12 00001100 0b1100
0 00000000 0b0
floor 1
byte 12 <class 'int'>
modulo 2
bits 0b11 <class 'str'>

Is there a computationally faster way for me to get the information that doesn't require me to calculate floor and modulo?

To put things into context I am parsing this file format: http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml

Update 01feb2015:

Thanks to @Dunes I read the documentation on from_bytes and found out that I can avoid doing divmod by just doing int.from_bytes with byteorder=small. The final function I adapted into my code is fsmall. I can't get timeit to work, so I'm not sure about the relative speeds of functions.

bytes_input = b'\x00\x0c\x00'
bit_position = 11-1
bpb = bits_per_byte = 8

def foriginal(bytes_input, bit_position):
    floor = bit_position//bpb
    byte = bytes_input[floor]
    modulo = bit_position%bpb
    return byte >> modulo & 0b11

def fdivmod(bytes_input, bit_position):
    div, mod = divmod(bit_position, bpb)
    return bytes_input[div] >> mod & 0b11

def fsmall(bytes_input, bit_position):
    int_bytes = int.from_bytes(bytes_input, byteorder='little')
    shift = bit_position
    bits = int_bytes >> shift & 0b11
    return bits

Solution

  • You could try:

    (int.from_bytes(bytes_input, 'big') >> bit_position) & 0b11
    

    It doesn't appear to be any quicker though, just terser.

    However, int.from_bytes(bytes_input, 'big') is the most time consuming part of that code snippet by a factor 2 to 1. If you can convert your data from bytes to int once, at the beginning of the program, then you will see quicker bit masking operations.

    In [52]: %timeit n = int.from_bytes(bytes_input, 'big')
    1000000 loops, best of 3: 237 ns per loop
    
    In [53]: %timeit n >> bit_position & 0b11
    10000000 loops, best of 3: 107 ns per loop