I want to stream audio from my microphone with python (on linux). I used the PyAlsaAudio module, but I got stuck.
My code so far:
import alsaaudio
CHAN = 1
RATE = 44400
PERIOD = RATE * 1
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE,
alsaaudio.PCM_NORMAL,
channels = CHAN,
rate = RATE,
format = alsaaudio.PCM_FORMAT_MPEG,
periodsize = PERIOD
)
wf = open('stream.mpeg', 'wb')
l,data = inp.read()
wf.write(data)
wf.close()
This does not throw any errors, but I can't open the output file:
$ ffplay stream.mpeg
ffplay version 4.2.7-0ubuntu0.1 Copyright (c) 2003-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
WARNING: library configuration mismatch
avcodec configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared --enable-version3 --disable-doc --disable-programs --enable-libaribb24 --enable-liblensfun --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libtesseract --enable-libvo_amrwbenc
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
stream.mpeg: Invalid data found when processing input 0B f=0/0
The documentation says:
Format: PCM_FORMAT_MPEG; Description: MPEG encoded audio?
I really don't know what's about that question mark on the end
ALSA enumerates all its formats and has a number reserved for SND_PCM_FORMAT_MPEG. Pyalsaaudio copied all enumerations droping the SND_ prefix. The PCM_FORMAT_MPEG format thus refers to the SND_PCM_FORMAT_MPEG in the ALSA library.
If we search the ALSA source code (https://github.com/search?q=org%3Aalsa-project+SND_PCM_FORMAT_MPEG&type=code) we find two hits. One is the definition and the other plays a role in the coupling to OSS (of which I know nothing), but there seems to be no role for it outside alsa-oss. I guess the person putting the question mark in the pyalsaaudio documentation observed thus that. The ALSA documentation and code are hard to read, so I speculate that rather than sorting this out completely this corner case was marked for requiring attention if you are interested.
That being said, it seems your intention is to write sound captured from a microphone to an mp3 file. There is no need to have the input device produce mpeg. The code below reads from the microphone and writes to an mp3 file. Using alsaaudio and pydub.
import alsaaudio
import numpy as np
import struct
import pydub
import time
conversion_dicts = {
alsaaudio.PCM_FORMAT_S16_LE: {'dtype': np.int16, 'endianness': '<', 'formatchar': 'h', 'bytewidth': 2},
}
def get_conversion_string(audioformat, noofsamples):
conversion_dict = conversion_dicts[audioformat]
conversion_string = f"{conversion_dict['endianness']}{noofsamples}{conversion_dict['formatchar']}"
return conversion_string
device = 'default'
fs = 44100
periodsize=512
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NONBLOCK,
channels=1, rate=fs, format=alsaaudio.PCM_FORMAT_S16_LE,
periodsize=periodsize, device=device)
print(inp.info())
with open("test.mp3", 'wb') as mp3file:
dtype = np.int16
loops_with_data = int(np.ceil(5 * fs/periodsize)) #
first_time = True
while loops_with_data > 0:
# Read data from device
l, rawdata = inp.read()
conversion_string = get_conversion_string(alsaaudio.PCM_FORMAT_S16_LE, l)
data = np.array(struct.unpack(conversion_string, rawdata), dtype=dtype)
if l > 0:
print(f"\r{loops_with_data:4} {l=}", end='')
if first_time:
# Create an empty song
song = pydub.AudioSegment(b'', frame_rate=fs, sample_width=2, channels=1)
# Clear the audio buffer
inp.drop()
first_time = False
else:
#smaller delay otherwise, still longer than one period length
song += pydub.AudioSegment(data.tobytes(), frame_rate=fs, sample_width=2, channels=1)
time.sleep(.1)
loops_with_data-=1
else:
print(".", end='')
song.export(mp3file, format="mp3", bitrate="320k")