I have a video of people talking. I also have a transcript. I chunked the words into sentences so that I could display 1 sentence at a time on the screen, like normal subtitles in a movie. To do so, I created a csv where there is a row for every frame, and every row contains the full sentence during that sentence time chunk. This way I loop over all frames and put text for the sentence on every frame within that sentence. I do it in OpenCV.
sample transcript csv:
frame sentence
0 hello
1 hello
2 how are you
3 how are you
4 how are you
5 how are you
6 how are you
7 how are you
8 fine
...
The csv is the same length as the number of frames in the video. To draw subtitles, I do this:
import cv2
import pandas as pd
df = pd.read_csv('data.csv')
video = cv2.VideoCapture('vid.mp4')
num_frames = video.get(cv2.CAP_PROP_FRAME_COUNT)
assert len(df) == num_frames
for i in list(range(0, num_frames)):
ret, frame = video.read()
cv2.putText(frame, str(df.sentence), (0,50),cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 3, cv2.LINE_AA, True)
# additional standard cv2 code below...
This works, but now I don't have any audio. I understand OpenCV does not work with any audio, but are there any other workarounds? This approach works well in my pipeline, so I'd like to be able to write these frames to a new video but keep audio while using as little additional libraries as possible.
After using the suggested moviepy solution, I get a subtitled video with no audio and the error below:
Moviepy - Building video vidout.mp4.
MoviePy - Writing audio in vidoutTEMP_MPY_wvf_snd.mp3
MoviePy - Done.
Moviepy - Writing video vidout.mp4
t: 100%|████████████████████████████████████████████▉| 23069/23084 [07:26<00:00, 66.35it/s, now=None]Traceback (most recent call last):
File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/Clip.py", line 472, in iter_frames
frame = self.get_frame(t)
File "<decorator-gen-11>", line 2, in get_frame
File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/decorators.py", line 89, in wrapper
return f(*new_a, **new_kw)
File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/Clip.py", line 93, in get_frame
return self.make_frame(t)
File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/Clip.py", line 136, in <lambda>
newclip = self.set_make_frame(lambda t: fun(self.get_frame, t))
File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/video/VideoClip.py", line 490, in <lambda>
return self.fl(lambda gf, t: image_func(gf(t)), apply_to)
File "make_demo.py", line 65, in pipeline
cv2.putText(frame, str(next(dfi)[1].word), (0, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 3, cv2.LINE_AA, True)
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "make_demo.py", line 72, in <module>
out_video.write_videofile("vidout.mp4", audio=True)
File "<decorator-gen-55>", line 2, in write_videofile
File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/decorators.py", line 54, in requires_duration
return f(clip, *a, **k)
File "<decorator-gen-54>", line 2, in write_videofile
File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/decorators.py", line 135, in use_clip_fps_by_default
return f(clip, *new_a, **new_kw)
File "<decorator-gen-53>", line 2, in write_videofile
File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/decorators.py", line 22, in convert_masks_to_RGB
return f(clip, *a, **k)
File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/video/VideoClip.py", line 307, in write_videofile
logger=logger)
File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/video/io/ffmpeg_writer.py", line 221, in ffmpeg_write_video
fps=fps, dtype="uint8"):
RuntimeError: generator raised StopIteration
If one additional library is okay, you could use moviepy
which has audio support:
import cv2
import pandas as pd
from moviepy.editor import VideoFileClip
def pipeline(frame):
try:
cv2.putText(frame, str(next(dfi)[1].sentence), (0, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 3, cv2.LINE_AA, True)
except StopIteration:
pass
# additional frame manipulation
return frame
dfi = pd.read_csv('data.csv').iterrows()
video = VideoFileClip("vid.mp4")
out_video = video.fl_image(pipeline)
out_video.write_videofile("vidout.mp4", audio=True)