Search code examples
pythondockeraudiotext-to-speechpyttsx3

Running pyttsx3 (espeak) text-to-speech in docker container creates awful sound quality


I am trying to run pyttsx3 (which runs on espeak) and create .mp3 files with it in python3.10.
The problem is that the created audio files have truly inaudible sound quality as can be seen/heard here: https://vocaroo.com/15u2rs6hOJXR
This problem only occurs when building the app as a docker image and then running said image using docker run mybot:latest. When running the app locally everything works fine

The docker file I am using is:

# syntax=docker/dockerfile:1
FROM python:3.10-slim-buster
ENV PATH /usr/local/bin:$PATH
COPY requirements.txt requirements.txt
COPY . /bot
RUN cd /bot
RUN pip3 install -r requirements.txt
RUN apt-get update && apt-get install -y \
    fonts-liberation \
    libasound2 \
    libatk-bridge2.0-0 \
    libatk1.0-0 \
    libatspi2.0-0 \
    libcups2 \
    libdbus-1-3 \
    libdrm2 \
    libgbm1 \
    libgtk-3-0 \
    libnspr4 \
    libnss3 \
    libwayland-client0 \
    libxcomposite1 \
    libxdamage1 \
    libxfixes3 \
    libxkbcommon0 \
    libxrandr2 \
    xdg-utils \
    libu2f-udev \
    libvulkan1 \
    espeak \
    ffmpeg \
    alsa-utils \
    libespeak1 \
    curl
RUN curl -LO  https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
RUN apt-get install -y ./google-chrome-stable_current_amd64.deb
RUN rm google-chrome-stable_current_amd64.deb
WORKDIR /bot
EXPOSE 3308
CMD [ "python3", "appHandler.py", "start", "dev" ]

There is no errors or warnings logged anywhere.

Does anyone know what could be the problem here? I havent found anything on this topic so far...


Solution

  • I "fixed" this issue and will leave this here in case anyone else ever encounters this problem:

    Alright, so after another day of research and trying stuff out I found out why this is the case.
    Pyttsx3 needs an engine (driver) to translate text to speech. It will use the standard driver of the OS if not specified. Since I was running on OSX it defaulted to NSSpeechSynthesizer, but when I ran the app in docker the OS changes to linux and the default driver became espeak.
    NSSpeechSynthesizer and espeak have huge differences in quality, meaning espeak sounds like a drunk robot.
    There is no way to use NSSpeechSynthesizer in docker as it is OSX exclusive.
    Since I didnt like the result of espeak at all I decided to switch from using pyttsx3 to using AWS polly, which can be found here and has amazing quality, is easy to use and very cheap for small developers like myself.