Search code examples
pythontensorflowpyaudiowave

Python How to convert pyaudio bytes into virtual file?


In Short

Is there a way to convert raw audio data (obtained by PyAudio module) into the form of virtual file (can be obtained by using python open() function), without saving it to the disk and read it from the disk? Details are provided as belows.

What Am I Doing

I'm using PyAudio to record audio, then it will be fed into a tensorflow model to get prediction. Currently, it works when I firstly save the recorded sound as .wav file on the disk, and then read it again to feed it into the model. Here is the code of recording and saving:

import pyaudio
import wave

CHUNK_LENGTH = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 1

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK_LENGTH)

print("* recording")
frames = [stream.read(RATE * RECORD_SECONDS)]  # here is the recorded data, in the form of list of bytes
print("* done recording")

stream.stop_stream()
stream.close()
p.terminate()

After I get the raw audio data (variable frames), it can be saved by using python wave module as belows. We can see that when saving, some meta message must be saved by calling functions like wf.setxxx.

import os

output_dir = "data/"
output_path = output_dir + "{:%Y%m%d_%H%M%S}.wav".format(datetime.now())

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

# save the recorded data as wav file using python `wave` module
wf = wave.open(output_path, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()

And here is the code of using the saved file to run inference on tensorflow model. It just simply read it as binary then the model will handle the rest.

import classifier  # my tensorflow model

with open(output_path, 'rb') as f:
    w = f.read()
    classifier.run_graph(w, labels, 5)

THE PROBLEM

For real-time needs, I need to keep streaming the audio and feeding it into the model once a while. But it seems unreasonable to keep saving the file on the disk and then read it again and again, which will spend losts of time on I/O.

I want to keep the data in memeory and use it directly, rather than saving and reading it repeatedly. However, python wave module does not support reading and writing simultaneously (refers here).

If I directly feed the data without some meta data (e.g. channels, frame rate) (which can be added by wave module during saving) like this:

w = b''.join(frames)
classifier.run_graph(w, labels, 5)

I will get error as belows:

2021-04-07 11:05:08.228544: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected RIFF but found 
Traceback (most recent call last):
  File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
    return fn(*args)
  File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Header mismatch: Expected RIFF but found

The tensorflow model I'm using is provided here: ML-KWS-for-MCU, hope this helps. Here is the code that produces the error: (classifier.run_graph())

def run_graph(wav_data, labels, num_top_predictions):
    """Runs the audio data through the graph and prints predictions."""
    with tf.Session() as sess:
        #   Feed the audio data as input to the graph.
        #   predictions  will contain a two-dimensional array, where one
        #   dimension represents the input image count, and the other has
        #   predictions per class
        softmax_tensor = sess.graph.get_tensor_by_name("labels_softmax:0")
        predictions, = sess.run(softmax_tensor, {"wav_data:0": wav_data})

        # Sort to show labels in order of confidence
        top_k = predictions.argsort()[-num_top_predictions:][::-1]
        for node_id in top_k:
            human_string = labels[node_id]
            score = predictions[node_id]
            print('%s (score = %.5f)' % (human_string, score))

        return 0

Solution

  • You should be able to use io.BytesIO instead of a physical file, they share the same interface but BytesIO is only kept in memory:

    import io
    container = io.BytesIO()
    wf = wave.open(container, 'wb')
    wf.setnchannels(4)
    wf.setsampwidth(4)
    wf.setframerate(4)
    wf.writeframes(b'abcdef')
    
    # Read the data up to this point
    container.seek(0)
    data_package = container.read()
    
    # add some more data...
    wf.writeframes(b'ghijk')
    
    # read the data added since last
    container.seek(len(data_package))
    data_package = container.read()
    

    This should allow you to continuously stream the data into the file while reading the excess using your TensorFlow code.