import wave,struct
f = wave.open('bird.wav', 'r')
for i in range(5,10):
frame = f.readframes(i)
print frame
struct.unpack('<H',frame)
I use the above code to extract bytes from a stereo wav file in python. However, instead of bytes I get some gibberish characters. Using the struct.unpack()
function I get the following error
"unpack requires a string argument of length 2"
What changes do I make in the code to print those bytes in 1's and 0's? I want to later the modify the LSB of audio frames for steganography.
I'm not sure why you want to print those bytes in binary, but it's easy enough to do so.
You need to convert the bytes to integers, and then format them using the str.format
method, the old %
-style formatting doesn't do bits.
The simple way to do that conversion is using the ord
function, but for large numbers of bytes it's better to convert them in one hit by creating a bytearray
.
#Some bytes, using hexadecimal escape codes
s = '\x01\x07\x0f\x35\xad\xff'
print ' '.join(['{0:08b}'.format(ord(c)) for c in s])
b = bytearray(s)
print ' '.join(['{0:08b}'.format(u) for u in b])
output
00000001 00000111 00001111 00110101 10101101 11111111
00000001 00000111 00001111 00110101 10101101 11111111
Generally, hexadecimal notation is more convenient to read than binary.
from binascii import hexlify
print hexlify(s)
print ' '.join(['%02X' % u for u in b])
print ' '.join(['%02X' % ord(c) for c in s])
print ' '.join(['{0:02X}'.format(ord(c)) for c in s])
output
01070f35adff
01 07 0F 35 AD FF
01 07 0F 35 AD FF
01 07 0F 35 AD FF
I just saw your comment re steganography. The most convenient way to twiddle the bits of your bytes is to use bytearray
. You can easily convert a bytearray
back to a string of bytes using the str
function.
print hexlify(str(b))
output
01070f35adff
The string formatting options are described in the official Python docs. For the old %
-style formatting, see 5.6.2. String Formatting Operations. For the modern str.format
options see 7.1.3. Format String Syntax and 7.1.3.1. Format Specification Mini-Language.
In {0:08b}
the 0
before the colon is the field position (which can be omitted in recent versions of Python). It says that we want to apply this formatting code to the first argument of .format
, i.e., the argument with index zero. Eg,
'{0} {2} {1}'.format('one', 'two', 'three')
prints
one three two
The b
means we want to print a number as binary. The 08
means we want the output to be 8 characters wide, with zero padding for binary numbers that are smaller than 8 bits.
In %02X
the uppercase X
means we want to print a number as hexadecimal, using uppercase letters A-F for digits greater than 9, we can use lowercase x
to get lowercase letters. The 02
means we want the the output to be 2 characters wide, with zero padding for hexadecimal numbers that are smaller than 2 hex digits.