Adding subtitles to video with python

I have a video of people talking. I also have a transcript. I chunked the words into sentences so that I could display 1 sentence at a time on the screen, like normal subtitles in a movie. To do so, I created a csv where there is a row for every frame, and every row contains the full sentence during that sentence time chunk. This way I loop over all frames and put text for the sentence on every frame within that sentence. I do it in OpenCV.

sample transcript csv:

frame     sentence
0           hello
1           hello
2           how are you
3           how are you
4           how are you
5           how are you
6           how are you
7           how are you 
8           fine
...

The csv is the same length as the number of frames in the video. To draw subtitles, I do this:

import cv2
import pandas as pd

df = pd.read_csv('data.csv')
video = cv2.VideoCapture('vid.mp4')
num_frames = video.get(cv2.CAP_PROP_FRAME_COUNT)

assert len(df) == num_frames

for i in list(range(0, num_frames)):
    ret, frame = video.read()
    cv2.putText(frame, str(df.sentence), (0,50),cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 3, cv2.LINE_AA, True)

    # additional standard cv2 code below...

This works, but now I don't have any audio. I understand OpenCV does not work with any audio, but are there any other workarounds? This approach works well in my pipeline, so I'd like to be able to write these frames to a new video but keep audio while using as little additional libraries as possible.

EDIT

After using the suggested moviepy solution, I get a subtitled video with no audio and the error below:

Moviepy - Building video vidout.mp4.
MoviePy - Writing audio in vidoutTEMP_MPY_wvf_snd.mp3
MoviePy - Done.                                                                                      
Moviepy - Writing video vidout.mp4

t: 100%|████████████████████████████████████████████▉| 23069/23084 [07:26<00:00, 66.35it/s, now=None]Traceback (most recent call last):
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/Clip.py", line 472, in iter_frames
    frame = self.get_frame(t)
  File "<decorator-gen-11>", line 2, in get_frame
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/decorators.py", line 89, in wrapper
    return f(*new_a, **new_kw)
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/Clip.py", line 93, in get_frame
    return self.make_frame(t)
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/Clip.py", line 136, in <lambda>
    newclip = self.set_make_frame(lambda t: fun(self.get_frame, t))
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/video/VideoClip.py", line 490, in <lambda>
    return self.fl(lambda gf, t: image_func(gf(t)), apply_to)
  File "make_demo.py", line 65, in pipeline
    cv2.putText(frame, str(next(dfi)[1].word), (0, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 3, cv2.LINE_AA, True)
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "make_demo.py", line 72, in <module>
    out_video.write_videofile("vidout.mp4", audio=True)
  File "<decorator-gen-55>", line 2, in write_videofile
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/decorators.py", line 54, in requires_duration
    return f(clip, *a, **k)
  File "<decorator-gen-54>", line 2, in write_videofile
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/decorators.py", line 135, in use_clip_fps_by_default
    return f(clip, *new_a, **new_kw)
  File "<decorator-gen-53>", line 2, in write_videofile
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/decorators.py", line 22, in convert_masks_to_RGB
    return f(clip, *a, **k)
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/video/VideoClip.py", line 307, in write_videofile
    logger=logger)
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/moviepy/video/io/ffmpeg_writer.py", line 221, in ffmpeg_write_video
    fps=fps, dtype="uint8"):
RuntimeError: generator raised StopIteration

Solution

If one additional library is okay, you could use moviepy which has audio support:

import cv2
import pandas as pd
from moviepy.editor import VideoFileClip

def pipeline(frame):
    try:
        cv2.putText(frame, str(next(dfi)[1].sentence), (0, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 3, cv2.LINE_AA, True)
    except StopIteration:
        pass
    # additional frame manipulation
    return frame

dfi = pd.read_csv('data.csv').iterrows()
video = VideoFileClip("vid.mp4")
out_video = video.fl_image(pipeline)
out_video.write_videofile("vidout.mp4", audio=True)