python pyaudio loopback portaudio pulseaudio

Read from audio output in PyAudio through loopbacks [Python record system output]

I'm writing a program that records from my speaker output using pyaudio. I am on a Raspberry Pi. I built the program while using the audio jack to play audio through some speakers, but recently have switched to using the speakers in my monitor, through HDMI. Suddenly, the program records silence.

from pyaudio import PyAudio


p = PyAudio()

print(p.get_default_input_device_info()['index'], '\n')
print(*[p.get_device_info_by_index(i) for i in range(p.get_device_count())], sep='\n\n')

The above code outputs first the index of the default input device of pyaudio, then the available devices. See the results below.

Case A:

2

{'index': 0, 'structVersion': 2, 'name': 'bcm2835 Headphones: - (hw:2,0)', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 8, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}

{'index': 1, 'structVersion': 2, 'name': 'pulse', 'hostApi': 0, 'maxInputChannels': 32, 'maxOutputChannels': 32, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': 0.008684807256235827, 'defaultHighInputLatency': 0.034807256235827665, 'defaultHighOutputLatency': 0.034807256235827665, 'defaultSampleRate': 44100.0}

{'index': 2, 'structVersion': 2, 'name': 'default', 'hostApi': 0, 'maxInputChannels': 32, 'maxOutputChannels': 32, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': 0.008684807256235827, 'defaultHighInputLatency': 0.034807256235827665, 'defaultHighOutputLatency': 0.034807256235827665, 'defaultSampleRate': 44100.0}

If I then go into to terminal, enter sudo raspi-config and change the audio output to the headphone jack, I get an actual recording, not silence, and receive a different output to the above code.

Case B:

5

{'index': 0, 'structVersion': 2, 'name': 'vc4-hdmi-0: MAI PCM i2s-hifi-0 (hw:0,0)', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 2, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.005804988662131519, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}

{'index': 1, 'structVersion': 2, 'name': 'bcm2835 Headphones: - (hw:2,0)', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 8, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}

{'index': 2, 'structVersion': 2, 'name': 'sysdefault', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 128, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.005804988662131519, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}

{'index': 3, 'structVersion': 2, 'name': 'hdmi', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 2, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.005804988662131519, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}

{'index': 4, 'structVersion': 2, 'name': 'pulse', 'hostApi': 0, 'maxInputChannels': 32, 'maxOutputChannels': 32, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': 0.008684807256235827, 'defaultHighInputLatency': 0.034807256235827665, 'defaultHighOutputLatency': 0.034807256235827665, 'defaultSampleRate': 44100.0}

{'index': 5, 'structVersion': 2, 'name': 'default', 'hostApi': 0, 'maxInputChannels': 32, 'maxOutputChannels': 32, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': 0.008684807256235827, 'defaultHighInputLatency': 0.034807256235827665, 'defaultHighOutputLatency': 0.034807256235827665, 'defaultSampleRate': 44100.0}

You can see in case B that I now have access to many different devices. I've attempted recording from all three available inputs in case A, and both #0 and #1 fail. #1 also records silence, and #0 returns OSError: [Errno -9998] Invalid number of channels. If you look closely at case A, you'll see that #0 has ['maxInputChannels'] = 0, so that's why.

I've attempted to create loopback devices that read from the sound output and introduce another input to pass the audio back in. I would then record from that input, as it would have input channels. I've researched on this thread here, but the only solution is for Windows.

I have also attempted to create a loopback device using the pulseaudio utility pactl. This link here demonstrates what I have tried. Upon succesfully creating a loopback, I'm unable to plug into it using pyaudio; it doesn't show up in the list of devices.

Does anybody know...

How to record from a pulseaudio loopback using pyaudio?
An alternative way of creating a loopback on Linux?
An alternative way of using pyaudio to solve my problem?

Thanks very much.

Solution

This problem took a while. Turns out, pyaudio is pretty useless for recording system audio, so I switched to pasimple, which has all of the benefits of pyaudio and, gasp, actually works. By benefits, I mean it is A) simple and B) has no dependencies. (In python. It does require pulseaudio).

Below you will find my Recorder object. Keep in mind that I am on Raspbery Pi, so my means of finding the correct output device to listen in on may not work on other systems.

pasimple works super well. Check out the documentation here. The tlength argument is worth looking into.

import json
import subprocess
import wave
from threading import Thread, Event

import pasimple as pa


class Recorder(Thread):
    def __init__(self) -> None:
        super().__init__()
        
        default_sink = subprocess.check_output('pactl get-default-sink', shell = True)
        
        self.device = '{}.monitor'.format(default_sink.decode().rstrip())
        
        devices = json.loads(subprocess.check_output('pactl --format="json" list sinks', shell = True))
        
        device = [device for device in devices if device['monitor_source'] == self.device][0]
        
        specs = device['sample_specification'].split()
        
        self.audio = {}
        
        self.audio['format'] = getattr(pa, 'PA_SAMPLE_{}'.format(specs[0].upper()))
        self.audio['channels'] = int(specs[1][:-2])
        self.audio['rate'] = int(specs[2][:-2])
        
        self.audio['sample-width'] = pa.format2width(self.audio['format'])
        
        self.is_recording = Event()
        self.kill = Event()
    
    def _get_sample_length(self, seconds: int) -> int:
        return self.audio['channels'] * self.audio['sample-width'] * self.audio['rate'] * seconds
    
    def _read_audio_data(self, seconds: int) -> bytes:
        return self.stream.read(self._get_sample_length(seconds))
    
    def record_to_file(self, file: str, seconds: int) -> None:
        data = self._read_audio_data(seconds)
        
        with wave.open(file, 'wb') as f:
            f.setnchannels(self.audio['channels'])
            f.setsampwidth(self.audio['sample-width'])
            f.setframerate(self.audio['rate'])
            
            f.writeframes(data)
    
    def run(self) -> None:
        self.stream = pa.PaSimple(
            direction = pa.PA_STREAM_RECORD,
            format = self.audio['format'],
            channels = self.audio['channels'],
            rate = self.audio['rate'],
            device_name = self.device,
            stream_name = 'thingamajiggy'
        )
        
        self.is_recording.set() # change state upon stream initialisation
        self.kill.wait() # await program end
        
        self.stream.flush() # release resources
        self.stream.close()


if __name__ == "__main__":
    recorder = Recorder()
    recorder.start()
    
    recorder.is_recording.wait() # wait for stream to be established
    
    recorder.record_to_file('example.wav', 10)
    
    recorder.kill.set() # kill thread, free resources