I'm working on some .mp4 files with Python. I'm using wave
, math
, contextlib
, speech_recognition
and AudioFileClip
libraries. I have very long files (video+audio). I would like to make Python cut the files in 5-minutes new files (still in .mp4) and then make Python transcribe each of them. Until now, I was able to write the following code to transcribe the initial (long) file:
import wave, math, contextlib
import speech_recognition as sr
from moviepy.editor import AudioFileClip
import os
os.chdir(" ... my path ...") # e.g. C:/Users/User/Desktop
FILE = "file_name" # e.g. video1 (without extension)
transcribed_audio_file_name = FILE + "_transcribed_speech.wav"
mp4_video_file_name = FILE + ".mp4"
audioclip = AudioFileClip(mp4_video_file_name)
audioclip.write_audiofile(transcribed_audio_file_name)
with contextlib.closing(wave.open(transcribed_audio_file_name,'r')) as f:
frames = f.getnframes()
rate = f.getframerate()
duration = frames / float(rate)
total_duration = math.ceil(duration / 60)
r = sr.Recognizer()
for i in range(0, total_duration):
with sr.AudioFile(transcribed_audio_file_name) as source:
audio = r.record(source, offset=i*60, duration=60)
f = open(FILE+"_transcription.py", "a")
f.write(r.recognize_google(audio, language="en-US"))
f.write(" ")
print(r.recognize_google(audio, language="en-US"))
f.close()
print("Transcription DONE.")
How can I add a part in which I take the file "video", cut into pieces of 5 minutes each, save them as .mp4 in my folder, process (and transcribe) each piece one-by-one? Thank you in advance!
I would recommend using a library called movie.py
pip3 install moviepy
Let’s say that your original video that you are trying to clip is 20 minutes long, and you want to create 3 smaller videos (5 Minutes Each)
Create a times.txt files and put:
0-300
300-600
600-900
Now the fun part, writing the code!
from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_subclip
# Replace the filename below.
required_video_file = "filename.mp4"
with open("times.txt") as f:
times = f.readlines()
times = [x.strip() for x in times]
for time in times:
starttime = int(time.split("-")[0])
endtime = int(time.split("-")[1])
ffmpeg_extract_subclip(required_video_file, starttime, endtime, targetname=str(times.index(time)+1)+".mp4")
Run the program with
python split.py
Hope that helped!