Search code examples
pythonobject-detectiontext-to-speechpyttsx3

Combining Object Detection with Text to Speech Code


I am trying to write an object detection + text-to-speech code to detect objects and produce a voice output on the raspberry pi 4. However, as of right now, I am trying to write a simple python script that incorporates both elements into a single .py file and preferably as a function. I will then run this script on the raspberry pi. I want to give credit to Murtaza's Workshop "Object Detection OpenCV Python | Easy and Fast (2020)" and https://pypi.org/project/pyttsx3/ for the Text to speech documentation for pyttsx3. I have attached the code below. I have tried running the program and I always keep getting errors with the Text to speech code (commented lines 33-36 for reference). I believe it is some looping error but I just can't seem to get the program to run continuously. For instance, if I run the code without the TTS part, it works fine. Otherwise, it runs for perhaps 3-5 seconds and suddenly stops. I am a beginner but highly passionate in computer vision, and any help is appreciated!

import cv2
#import pyttsx3

cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 480)

classNames = []
classFile = 'coco.names'
with open(classFile,'rt') as f:
    classNames = [line.rstrip() for line in f]

configPath = 'ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt'
weightsPath = 'frozen_inference_graph.pb'

net = cv2.dnn_DetectionModel(weightsPath, configPath)
net.setInputSize(320, 320)
net.setInputScale(1.0 / 127.5)
net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)

while True:
    success, img = cap.read()
    classIds, confs, bbox = net.detect(img, confThreshold=0.45)
    if len(classIds) != 0:
        for classId, confidence, box in zip(classIds.flatten(), confs.flatten(), bbox):
            className = classNames[classId-1]
            #engine = pyttsx3.init()
            #str1 = str(className)
            #engine.say(str1 + "detected")
            #engine.runAndWait()
            cv2.rectangle(img, box, color=(0, 255, 0), thickness=2)
            cv2.putText(img, classNames[classId-1].upper(), (box[0]+10, box[1]+30),
                cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
            cv2.putText(img, str(round(confidence * 100, 2)), (box[0]+200, box[1]+30),
                cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
    cv2.imshow('Output', img)
    cv2.waitKey(1)

Here is a screenshot of my code 1

Here is a link to the download files needed to run code as well in case

Here is the error: /Users/venuchannarayappa/PycharmProjects/ObjectDetector/venv/bin/python /Users/venuchannarayappa/PycharmProjects/ObjectDetector/main.py

Traceback (most recent call last): File "/Users/venuchannarayappa/PycharmProjects/ObjectDetector/main.py", line 24, in

classIds, confs, bbox = net.detect(img, confThreshold=0.45)

cv2.error: OpenCV(4.5.4) /Users/runner/work/opencv-python/opencv-python/opencv/modules/imgproc/src/resize.cpp:4051: error: (-215:Assertion failed) !ssize.empty() in function 'resize'

Process finished with exit code 1

Link to video output recorded through iphone: https://www.icloud.com/iclouddrive/03jGfqy7-A9DKfekcu3wjk0rA#IMG_4932

Sorry for such a long post! I was debugging my code for the past few hours and I think I got it to work. I changed the main while loop only and rest of code is the same. The program seems to run continuously for me. I would appreciate any comments if there are any difficulties in running it.

engine = pyttsx3.init()
while True:
    success, img = cap.read()
    #print(success)
    #print(img)
    #print(img.shape)
    classIds, confs, bbox = net.detect(img, confThreshold=0.45)
    if len(classIds) != 0:
        for classId, confidence, box in zip(classIds.flatten(), confs.flatten(), bbox):
            className = classNames[classId - 1]
            #print(len(classIds))
            str1 = str(className)
            #print(str1)
            engine.say(str1 + "detected")
            engine.runAndWait()
            cv2.rectangle(img, box, color=(0, 255, 0), thickness=2)
            cv2.putText(img, classNames[classId-1].upper(), (box[0]+10, box[1]+30),
                cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
            cv2.putText(img, str(round(confidence * 100, 2)), (box[0]+200, box[1]+30),
                cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
        continue
    cv2.imshow('Output', img)
    cv2.waitKey(1)

I am planning to run this code on the raspberry pi. I am planning on installing opencv using this command: pip3 install opencv-python. However, I am not sure how to install pyttsx3 since I think I need to install from source. Please let me know if there is a simple method to install pyttsx3.

Update: As of December 27th, I have installed all necessary packages and my code is now functional.


Solution

  • I installed pyttsx3 using the two commands in the terminal on the Raspberry Pi:

    1. sudo apt update && sudo apt install espeak ffmpeg libespeak1
    2. pip install pyttsx3

    I followed the video youtube.com/watch?v=AWhDDl-7Iis&ab_channel=AiPhile to install pyttsx3. My functional code should also be listed above. My question should be resolved but hopefully useful to anyone looking to write a similar program. I have made minor tweaks to my code.