I am writing python code to take in some audio file, and remove some part of it when 'begin' is found. The code should remove all the text to the left when the word 'begin' is found in the text variable.
import speech_recognition as sr
import openai
from gtts import gTTS
import os
import pygame
filename = "Recording.wav"
r = sr.Recognizer()
language = 'en'
with sr.AudioFile(filename) as source:
# listen for the data (load audio to memory)
audio_data = r.record(source)
text = r.recognize_google(audio_data).lower()
words = text.split()
b = 0
for word in words:
b +=1
if word == 'begin':
text = text[:-b]
print(b)
print(text)
I tried giving it a text file with the word 'begin' in the middle of it. I was expecting to get a trimmed result but I didn't.
If I understand correctly, the keyword begin
indicates that everything after it is content you want to retain. I would phrase this using a regex replacement:
text = r.recognize_google(audio_data).lower()
text = re.sub(r'.*?\bbegin\b\s*', '', text)
If you would also require that begin
must appear, then do a check for that:
text = r.recognize_google(audio_data).lower()
if re.search(r'\bbegin\b', text):
text = re.sub(r'.*?\bbegin\b\s*', '', text)
else:
text = ''