Search code examples
pythonpython-3.xmonowavwave

WAV file modifier using Python


I wrote a simple Python program to read a wave file and after changing it store it as a new file.

import codecs, wave

#convert a number to its two's complemented value (For positive it is equal itself)
def convert_to_twos(value, wid_len=16):
    if value < 0 :
        value = value + (1 << wid_len)
    return value

#receive the value of a two's complemented number.
def twos_back_value(value, wid_len=16):
    if value & (1 << wid_len -1):
        value = value - (1 << wid_len)
    return value

#opening files
input_file = wave.open(r"<address of input wave file>", 'r')
output_file = wave.open(r"<an address for output wave file>", 'w')

#Get input file parameters and set them to the output file after modifing the channel number.
out_params = [None, None, None, None, None, None]
in_params = input_file.getparams()
out_params[0] = 1 # I want to have a mono type wave file in output. so I set the channels = 1
out_params[1] = in_params[1] #Frame Width
out_params[2] = in_params[2] #Sample Rate
out_params[3] = in_params[3] #Number of Frames
out_params[4] = in_params[4] #Type
out_params[5] = in_params[5] #Compressed or not
output_file.setparams(out_params)

#reading frames from first file and storing in the second file
for frame in range(out_params[2]):
    value = int(codecs.getencoder('hex')(input_file.readframes(1))[0][:4], 16) #converting first two bytes of each frame (let assume each channel has two bytes frame length) to int (from byte string).
    t_back_value = twos_back_value( value ,out_params[1]*8)
    new_value = int(t_back_value * 1)
    new_twos = convert_to_twos(new_value, out_params[1]*8)
    to_write = new_twos.to_bytes((new_twos.bit_length() + 7) // 8, 'big')
    output_file.writeframes(to_write)


#closing files
input_file.close()
output_file.close()

The problem is when I run the above program and play the output file I can hear only noise and nothing else! (While I expect the same file only in one channel mode!)

Update:

I got something weird. Based on the documentation, the function readframes(n) Reads and returns at most n frames of audio, as a string of bytes. So I expect only hex values in return by this function. But in real I can see some weird non-hex values:

read_frame = input_file.readframes(1)
print (read_frame)
print (codecs.getencoder('hex')(read_frame)[0])
print ("")

above code, in a for loop return this:

b'\xe3\x00\xc7\xf5'
b'e300c7f5'

b'D\xe8\xa1\xfd'
b'44e8a1fd'

b'\xde\x08\xb2\x1c'
b'de08b21c'

b'\x17\xea\x10\xe9'
b'17ea10e9'

b'{\xf7\xbc\xf5'
b'7bf7bcf5'

b'*\xf6K\x08'
b'2af64b08'

As you see there are some non-hex values in the read_frame! (*, }, D, ... for example). What are these?


Solution

  • The values you are seeing are the four bytes for each frame, namely two bytes for the first channel and two bytes for the second channel. For a mono WAV, you would only see two bytes.

    The following approach should get you going on the correct path. You need to use Python's struct library to convert your binary frame values into signed integers. You can then manipulate them as required. For my example I simply multiply by 2/3:

    import wave
    import codecs
    import struct
    
    #opening files
    input_file = wave.open(r"sample.wav", 'rb')
    output_file = wave.open(r"sample_out.wav", 'wb')
    
    #Get input file parameters and set them to the output file after modifing the channel number.
    in_params = list(input_file.getparams())
    
    out_params = in_params[:]
    out_params[0] = 1
    output_file.setparams(out_params)
    
    nchannels, sampwidth, framerate, nframes, comptype, compname = in_params
    format = '<{}h'.format(nchannels)
    
    #reading frames from first file and storing in the second file
    for index in range(nframes):
        frame = input_file.readframes(1)
        data = struct.unpack(format, frame)
        value = data[0]     # first (left) channel only
        value = (value * 2) // 3    # apply a simple function to each value
        output_file.writeframes(struct.pack('<h', value))
    
    #closing files
    input_file.close()
    output_file.close()
    

    Note, processing a wave file a frame at a time like this will be painfully slow. It could be sped up by reducing the number of calls to writeframes.

    format holds the format needed to unpack the binary values. For a 2-channel WAV file this will contain 4 bytes. format will then be configured as <hh which means using struct.unpack will result in two fields, each containing the integer representation of each channel. So four bytes becomes a list of two integers, one for each channel.