Search code examples
pythonpython-3.xaudioaudio-recordingpython-sounddevice

How to record audio in python for undetermined duration AND allow for pause and resume features?


I'm writing a Python app to record audio as a WAV file until a user presses pause or stop. After pausing the audio, the user should also be able to resume recording. Additionally:

  • The app can't know how long the recording will be beforehand
  • The app should avoid running out of memory (since the recording could be very long). For example, it could write to the WAV file in real-time to prevent storing the growing recording in memory.

What's a good approach for this problem? Can you please provide some code snippets for your solution?

With python-sounddevice, I could stop() and start() the stream to mimic a 'pause' features. And I can specify a numpy array as an output for the recording. But:

  • I don't know how big to make the array (since I don't know the recording duration)
  • What would I do when the array fills up?

python-sounddevice and sound-file can support recordings without know the size beforehand. But:

  • How would I incorporate 'pause' and 'resume' features? Sound-file has only read and write methods.
  • Is there a better way to stop the stream than using a KeyBoardInterrupt?
  • Could I create different recording after every 'pause' and combine the WAV files after the user clicks 'stop'?
  • I tried using Threading.Event() to block the recording thread to mimic a pause feature, but the recording kept writing to the file

My attempt at sound-device approach

paused = False

def record():
    self.recording = ? # create numpy.ndarray of the correct size 
                       # (not sure the best way to do this without 
                       # knowing the recording duration)
    with sd.InputStream(samplerate=44100, device=mic, channels=1, 
        callback=self.callback):

            while self.paused:
            sd.stop()
        sd.rec(out=recording) # but what happens if 
                              # recording is very long
                              # or numpy array fills up?

def stop_and_save():
    sd.stop()
    scipy.io.wavfile.write("recording.wav", 44100, self.recording)


The sound-device and sound-file approach:

with sf.SoundFile(args.filename, mode='x', samplerate=args.samplerate,
                      channels=args.channels, subtype=args.subtype) as file:
        with sd.InputStream(samplerate=args.samplerate, device=args.device,
                            channels=args.channels, callback=callback):
            print('press Ctrl+C to stop the recording')
            while True:
                file.write(q.get())  # but how do you stop writing when 'paused'?

except KeyboardInterrupt:
    print('\nRecording finished: ' + repr(args.filename))
    parser.exit(0)
except Exception as e:
    parser.exit(type(e).__name__ + ': ' + str(e))

Solution

  • I came up with this solution to the pause/resume feature, which utilizes the sound-device and sound-file approach, where the current recording is stopped whenever the user clicks Pause and a new recording is started upon Resume. Then, after the user clicks Stop, all the WAV files are combined in order.

    (Matthias' code also looks like a fine solution that takes more advantage of threads.)


    To Start recording audio:

        def record(self):
            try:
                with sf.SoundFile(self.filepath,
                                           mode='x', samplerate=self.SAMPLE_RATE,
                                           channels=self.CHANNELS, subtype=None) as file:
                    with sd.InputStream(samplerate=self.SAMPLE_RATE, device=self.mic_id,
                                               channels=self.CHANNELS, callback=self.callback):
                        logger.info(f"New recording started: {self.sound_file.name}")
                        try:
                            while True:
                                file.write(self.mic_queue.get())
    
                        except RuntimeError as re:
                            logger.debug(f"{re}. If recording was stopped by the user, then this can be ignored")
    

    Callback for record():

    
        def callback(self, indata, frames, time, status):
            """This is called (from a separate thread) for each audio block."""
            if status:
                print(status, file=sys.stderr)
            self.mic_queue.put(indata.copy())
    

    To Pause:

        def pause_recording(self):
            """Mimics a 'pause' functionality by writing the current sound file changes to disk.
            Upon 'resume' a new recording will be made. Note: close() is not called here, because
            that would kill the recording thread
            """
            self.sound_file.flush()
            logger.info(f"'Paused' (closed) recording: {self.sound_file.name}")
    

    To Resume:

        def resume_recording(self):
            """
            Mimics 'resuming' by starting a new recording, which will be merged with the others
            when the user selects Stop & Save (or deleted upon Stop & Delete)
            Note: get_full_sound_file_name() outputs a new recording with the same base name as the first, but appends a `_part2` or `_part3` etc. to the suffix to distinguish it from the first and maintain order.
            """
            self.sound_file = self.get_full_sound_file_name()
            self.record()
    

    To Stop recording:

        def stop_mic_recording(self):
            try:
                self.sound_file.flush()
                self.sound_file.close()
                logger.info(f"Stopped and closed recording: {self.sound_file.name}")
    
            except RuntimeError as e:
                logger.info(f"Error stopping/saving {self.sound_file.name}. Make sure the file exists and can be modified")
                logger.info(f"RunTimeError: \n{e}")
    

    To combine audio (called after stop_recording()):

       def combine_recordings_if_needed(self):
            """
            If recording was paused, combines all sections in alphabetical order into a new audio file
            """
            if self.section_count > 1:   # this is incremented when a recording is paused/resumed
                combined_audio = AudioSegment.empty()
                files_combined = []
                for rec in glob.glob(os.path.join(RECORDING_DIR, "*" + self.FILE_EXT)):
                    combined_audio = combined_audio + AudioSegment.from_wav(rec) # this is why alphabetical order is important
                    files_combined.append(rec)
    
                combined_file_name = os.path.join(RECORDING_DIR, self.base_filename + "_combined" + self.FILE_EXT)
                combined_audio.export(out_f=combined_file_name, format="wav")
                logger.info(f"Combined the following recordings into {combined_file_name}:"
                            f"\n {files_combined}")