Search code examples
python-3.xpyalsaaudio

What does alsaaudio.PCM_FORMAT_MPEG do exactly?


I want to stream audio from my microphone with python (on linux). I used the PyAlsaAudio module, but I got stuck.

My code so far:

import alsaaudio

CHAN = 1
RATE = 44400
PERIOD = RATE * 1

inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE,
                    alsaaudio.PCM_NORMAL,
                    channels = CHAN,
                    rate = RATE,
                    format = alsaaudio.PCM_FORMAT_MPEG,
                    periodsize = PERIOD
                    )
wf = open('stream.mpeg', 'wb')
l,data = inp.read()
wf.write(data)
wf.close()

This does not throw any errors, but I can't open the output file:

$ ffplay stream.mpeg 
ffplay version 4.2.7-0ubuntu0.1 Copyright (c) 2003-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  WARNING: library configuration mismatch
  avcodec     configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared --enable-version3 --disable-doc --disable-programs --enable-libaribb24 --enable-liblensfun --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libtesseract --enable-libvo_amrwbenc
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
stream.mpeg: Invalid data found when processing input    0B f=0/0

The documentation says:

Format: PCM_FORMAT_MPEG; Description: MPEG encoded audio?

I really don't know what's about that question mark on the end


Solution

  • ALSA enumerates all its formats and has a number reserved for SND_PCM_FORMAT_MPEG. Pyalsaaudio copied all enumerations droping the SND_ prefix. The PCM_FORMAT_MPEG format thus refers to the SND_PCM_FORMAT_MPEG in the ALSA library.

    If we search the ALSA source code (https://github.com/search?q=org%3Aalsa-project+SND_PCM_FORMAT_MPEG&type=code) we find two hits. One is the definition and the other plays a role in the coupling to OSS (of which I know nothing), but there seems to be no role for it outside alsa-oss. I guess the person putting the question mark in the pyalsaaudio documentation observed thus that. The ALSA documentation and code are hard to read, so I speculate that rather than sorting this out completely this corner case was marked for requiring attention if you are interested.

    That being said, it seems your intention is to write sound captured from a microphone to an mp3 file. There is no need to have the input device produce mpeg. The code below reads from the microphone and writes to an mp3 file. Using alsaaudio and pydub.

    import alsaaudio
    import numpy as np
    import struct
    import pydub 
    import time
    
    conversion_dicts = {
            alsaaudio.PCM_FORMAT_S16_LE: {'dtype': np.int16, 'endianness': '<', 'formatchar': 'h', 'bytewidth': 2},
    }
    
    def get_conversion_string(audioformat, noofsamples):
        conversion_dict = conversion_dicts[audioformat]
        conversion_string = f"{conversion_dict['endianness']}{noofsamples}{conversion_dict['formatchar']}"
        return conversion_string
    
    device = 'default'
    fs = 44100
    periodsize=512
    
    inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NONBLOCK, 
        channels=1, rate=fs, format=alsaaudio.PCM_FORMAT_S16_LE, 
        periodsize=periodsize, device=device)
    
    print(inp.info())
    
    with open("test.mp3", 'wb') as mp3file:
        
        
        dtype = np.int16 
    
        loops_with_data = int(np.ceil(5 * fs/periodsize))  # 
        first_time = True
    
        while loops_with_data > 0:
            # Read data from device
            l, rawdata = inp.read()
    
            conversion_string = get_conversion_string(alsaaudio.PCM_FORMAT_S16_LE, l)
            data = np.array(struct.unpack(conversion_string, rawdata), dtype=dtype)
    
            if l > 0:
                print(f"\r{loops_with_data:4} {l=}", end='')
                if first_time:
                    # Create an empty song
                    song = pydub.AudioSegment(b'', frame_rate=fs, sample_width=2, channels=1)
                    
                    # Clear the audio buffer
                    inp.drop()
                    first_time = False
                else:
                    #smaller delay otherwise, still longer than one period length
                    song += pydub.AudioSegment(data.tobytes(), frame_rate=fs, sample_width=2, channels=1)
                
                time.sleep(.1)
                loops_with_data-=1
            else:
                print(".", end='')
        
        song.export(mp3file, format="mp3", bitrate="320k")