Search code examples
pythonscipywav

Multiplying samples array by scaling factor gives ambiguous results when reading/writing .wav files


I need to manipulate some .wav files, and I am using the scipy.io.wavfile module to help me with this task.

I ran into a problem when I tried to understand how the read and write functions work.

I have a sample file input_file.wav. The code I wrote that worked as expected was:

def scale(filename):
    fs, x = wavfile.read(filename)
    wavfile.write('test_output.wav', fs, x)
    return
scale('input_file.wav')

The input and output files looked identical when I imported them into Audacity, and sounded identical on my headphones. I ran into issues when I executed the following code.

def scale(filename):
    fs, x = wavfile.read(filename)
    x1 = x * 0.5
    wavfile.write('test_output1.wav', fs, x1)
    return
scale('input_file.wav')

I expected that the output would be half as loud (since I multiplied the value of each sample by 0.5. But when I imported it into Audacity, the file was loud to the point of severe distortion.

The same thing happened when I multipled by 1.01, 1.0001, 0.1, and a number of other values I tried - massively boosted volume to the point of large distortions.

The file started to sound identical (and look identical when imported into Audacity) when I multiplied the sample array by a value of 1/32767 or so (which is 1/(2^15-1)). This is strange because the values in the sample array returned by the read() function are definitely not identical.

Why do the output files from the write operation sound the same when the scaling value is either 1 or 1/32767, two very different numbers?

Any help would be appreciated, thank you.

EDIT: If it helps, the output of x.dtype (the dtype attribute of the sample array returned by read() is int16).


Solution

  • If x has dtype np.int16, then x1 has dtype np.float64. It appears that scipy.io.wavfile.write attempts to write 64 bit floating values to the file, even though the documentation only mentions 32 bit floating point formats. You can work around the problem by converting x1 to int16, or by normalizing the values in x1 to the range [-1, 1] (or [-0.5, 0.5], or to whatever range you want in [-1, 1]). That is, you can use

    wavfile.write('test_output1.wav', fs, np.round(x1).astype(x.dtype))  # If x has an integer dtype
    

    or

    wavfile.write('test_output1.wav', fs, (x1/2**15).astype(np.float32))