If I have 3 bytes b'\x00\x0c\x00'
, which can be represented with the bits 00000000 00001100 00000000
, how do I then parse the 11th and 12th bit 11
most efficiently?
Here positions:
**
00000000 11111110 22222111 tens
87654321 65432109 43210987 ones
|||||||| |||||||| ||||||||
00000000 00001100 00000000
**
I have the following code:
bytes_input = b'\x00\x0c\x00'
for byte in bytes_input:
print(byte, '{:08b}'.format(byte), bin(byte))
bit_position = 11-1
bits_per_byte = 8
floor = bit_position//bits_per_byte
print('floor', floor)
byte = bytes_input[floor]
print('byte', byte, type(byte))
modulo = bit_position%bits_per_byte
print('modulo', modulo)
bits = bin(byte >> modulo & 3)
print('bits', bits, type(bits))
Which returns:
0 00000000 0b0
12 00001100 0b1100
0 00000000 0b0
floor 1
byte 12 <class 'int'>
modulo 2
bits 0b11 <class 'str'>
Is there a computationally faster way for me to get the information that doesn't require me to calculate floor and modulo?
To put things into context I am parsing this file format: http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml
Update 01feb2015:
Thanks to @Dunes I read the documentation on from_bytes and found out that I can avoid doing divmod
by just doing int.from_bytes
with byteorder=small
. The final function I adapted into my code is fsmall
. I can't get timeit
to work, so I'm not sure about the relative speeds of functions.
bytes_input = b'\x00\x0c\x00'
bit_position = 11-1
bpb = bits_per_byte = 8
def foriginal(bytes_input, bit_position):
floor = bit_position//bpb
byte = bytes_input[floor]
modulo = bit_position%bpb
return byte >> modulo & 0b11
def fdivmod(bytes_input, bit_position):
div, mod = divmod(bit_position, bpb)
return bytes_input[div] >> mod & 0b11
def fsmall(bytes_input, bit_position):
int_bytes = int.from_bytes(bytes_input, byteorder='little')
shift = bit_position
bits = int_bytes >> shift & 0b11
return bits
You could try:
(int.from_bytes(bytes_input, 'big') >> bit_position) & 0b11
It doesn't appear to be any quicker though, just terser.
However, int.from_bytes(bytes_input, 'big')
is the most time consuming part of that code snippet by a factor 2 to 1. If you can convert your data from bytes
to int
once, at the beginning of the program, then you will see quicker bit masking operations.
In [52]: %timeit n = int.from_bytes(bytes_input, 'big')
1000000 loops, best of 3: 237 ns per loop
In [53]: %timeit n >> bit_position & 0b11
10000000 loops, best of 3: 107 ns per loop