python python-3.x multithreading runtime-error pyttsx3

Run Time error on using Threading in a while loop

I am working on Vision-Based American Sign Language Converter Application using MediaPipe and OpenCV in python. As you may know, in webcam, we use a while loop for continuous hand detection of our hand. Now the problem was that when I added audio function for the detected gesture using pyttx3 the webcam always got struck during the execution of pyttsx3 engine function. To resolve that issue, I used threading and made a thread for that audio related part and put that thread inside that webcam while loop. But now the problem is that, the "Runtime Error: loop is already started" continuously appears in the terminal.

I have made a model code for my problem. I know the problem is that the threading function is calling the speak function again and again, and inside the speak function I have used 'runAndWait()' method, which is giving the error that loop is already started. Please look into this matter. I have been looking for the solution for two weeks. I have tried every possible thing on my own.

import pyttsx3
from threading import Thread
import random


engine = pyttsx3.init()
def speak(text):
    engine.say(text)
    engine.runAndWait()

while True:
    l = ['A', 'B', 'C', 'D', 'E']
    a = random.choice(l)
    print(a)
    
    t = Thread(target=speak, args=(a,))
    t.start()
    #engine.stop()

Solution

As you have mentioned, the problem is that you used threading which calls the function over and over again. This is problematic. First of all, you are creating 100s of Threading instances each second which is highly inefficient. Secondly, you are creating a backlog of 1000s of calls to be made to the "engine" which will take hours to finish. Despite these, it is possible to change your code and make it run the way you intend to:

import pyttsx3
from threading import Thread
import random
from queue import Queue


def speak(q):
    engine = pyttsx3.init()
    
    while True:
        
        if q.empty() is False:
            a = q.get()
            engine.say(a)
            engine.runAndWait()
        else:
            pass

queue= Queue()
t = Thread(target=speak, args=(queue,))
t.start()

while True:
    l = ['A', 'B', 'C', 'D', 'E']
    a = random.choice(l)
    queue.put(a)
    print(a)

Essentially, you are making a new thread (only one) that communicates with the pyttsx3. Each iteration, you will put whatever you want to read into the queue and the system will take them one by one (in a first come first serve manner) and read them. This will only need one extra thread, therefore you won't need to create 100s of Thread instances.

However, this method still creates a backlog of 1000s of speeches to be made. I do not have experience with ASL analysis. But the way I would approach this problem would be to use a separate thread to analyze the last 2~5 seconds of the video and if the hand gesture is detected, it would initialize the speak command. This will make the speech a couple of seconds behind the video which is completely fine in this scenario.

EDIT: a better way to approach this problem, the pseudocode:

fps = 30       // # of frames per second for the video (assumed 30)
sec = 2        // Number of seconds for each analysis
counter=1      // This counts the frame from previous analysis
frameBatch =[] // This includes the frames for the last 2 sec that need to be 
               // analyzed. Its better to use *Numpy*
while true:
    frame = *Get the frame from the webcam*
    if counter % (fpa*sec)==0:
        analyseAndSpeak(frameBatch) // This analyse method will analyze the 
                                    // last 2 seconds and then send the 
                                    // analysis to the speak engine

         counter =1                 // Resets the counter 
         frameBatch =[]             // Empties the frameBatch
    else:
         frameBatch.append(frame)   // Adds the current frame to the batch
         counter +=1

Using this method, for every two seconds of the video, the frames are stored and then dumped into an analysis function that also reads the result. This function should be on a different thread. After this, a new two seconds is started and this continues. You should play with these numbers and try to tune the speed of speech to not lag behind the video.

You should also be careful with the while loop because it is extremely fast so either use a different method or your FPS should be very high. If your FPS is high (more than 100), you can drop every other frame from the frameBatch to increase efficiency. Hope this helps.