Is there a way to convert raw audio data (obtained by PyAudio
module) into the form of virtual file (can be obtained by using python open()
function), without saving it to the disk and read it from the disk? Details are provided as belows.
I'm using PyAudio
to record audio, then it will be fed into a tensorflow model to get prediction. Currently, it works when I firstly save the recorded sound as .wav
file on the disk, and then read it again to feed it into the model. Here is the code of recording and saving:
import pyaudio
import wave
CHUNK_LENGTH = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 1
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK_LENGTH)
print("* recording")
frames = [stream.read(RATE * RECORD_SECONDS)] # here is the recorded data, in the form of list of bytes
print("* done recording")
stream.stop_stream()
stream.close()
p.terminate()
After I get the raw audio data (variable frames
), it can be saved by using python wave
module as belows. We can see that when saving, some meta message must be saved by calling functions like wf.setxxx
.
import os
output_dir = "data/"
output_path = output_dir + "{:%Y%m%d_%H%M%S}.wav".format(datetime.now())
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# save the recorded data as wav file using python `wave` module
wf = wave.open(output_path, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
And here is the code of using the saved file to run inference on tensorflow model. It just simply read it as binary then the model will handle the rest.
import classifier # my tensorflow model
with open(output_path, 'rb') as f:
w = f.read()
classifier.run_graph(w, labels, 5)
For real-time needs, I need to keep streaming the audio and feeding it into the model once a while. But it seems unreasonable to keep saving the file on the disk and then read it again and again, which will spend losts of time on I/O.
I want to keep the data in memeory and use it directly, rather than saving and reading it repeatedly. However, python wave
module does not support reading and writing simultaneously (refers here).
If I directly feed the data without some meta data (e.g. channels, frame rate) (which can be added by wave
module during saving) like this:
w = b''.join(frames)
classifier.run_graph(w, labels, 5)
I will get error as belows:
2021-04-07 11:05:08.228544: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected RIFF but found
Traceback (most recent call last):
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
return fn(*args)
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
target_list, run_metadata)
File "C:\Users\anaconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Header mismatch: Expected RIFF but found
The tensorflow model I'm using is provided here: ML-KWS-for-MCU, hope this helps.
Here is the code that produces the error: (classifier.run_graph()
)
def run_graph(wav_data, labels, num_top_predictions):
"""Runs the audio data through the graph and prints predictions."""
with tf.Session() as sess:
# Feed the audio data as input to the graph.
# predictions will contain a two-dimensional array, where one
# dimension represents the input image count, and the other has
# predictions per class
softmax_tensor = sess.graph.get_tensor_by_name("labels_softmax:0")
predictions, = sess.run(softmax_tensor, {"wav_data:0": wav_data})
# Sort to show labels in order of confidence
top_k = predictions.argsort()[-num_top_predictions:][::-1]
for node_id in top_k:
human_string = labels[node_id]
score = predictions[node_id]
print('%s (score = %.5f)' % (human_string, score))
return 0
You should be able to use io.BytesIO instead of a physical file, they share the same interface but BytesIO is only kept in memory:
import io
container = io.BytesIO()
wf = wave.open(container, 'wb')
wf.setnchannels(4)
wf.setsampwidth(4)
wf.setframerate(4)
wf.writeframes(b'abcdef')
# Read the data up to this point
container.seek(0)
data_package = container.read()
# add some more data...
wf.writeframes(b'ghijk')
# read the data added since last
container.seek(len(data_package))
data_package = container.read()
This should allow you to continuously stream the data into the file while reading the excess using your TensorFlow code.