would like to generate text files for frames extracted with ffmpeg, containing subtitle of the frame if any, on a video for which I have burn the subtitles using ffmpeg also.
I use a python script with pysrt
to open the subrip file and generate the text files.
What I am doing is that each frames is named with the frame number by ffmpeg, then and since they are extracted at a constant rate, I can easily retrieve the time position of the frame using the formula t1 = fnum/fps
, where fnum
is the number of the frame retrieved with the filename, and fps
is the frequency passed to ffmpeg for the frame extraction.
Even though I am using the same subtitle file to retrieve the text positions in the timeline, that the one that has been used in the video, I still get accuracy errors. Most I have some text files missing or some that shouldn't be present.
Because time is not really continuous when talking about frames, I have tried recalibrating t
using the fps of the video wih the hardcoded subtitles, let's call that fps vfps
for video fps (I have ensured that the video fps is the same before and after subtitle burning). I get the formula: t2 = int(t1*vfps)/vfps
It still is not 100% accurate.
For example, my video is at 30fps (vfps=30
) and I extracted frames at 4fps (fps=4
The extracted frame 166 (fnum=166
) shows no subtitle. In the subrip file, the previous subtitle ends at t_prev=41.330
and the next subtitle begins at t_next=41.400
, which means that t_sub
should satisfy: t_prev < t_sub and t_sub < t_next
, but I can't make this happen.
Formulas I have tried:
t1 = fnum/fps # 41.5 > t_next
t2 = int(fnum*vfps/fps)/vfps # 41.5 > t_next
# is it because of a indexing problem? No:
t3 = (fnum-1)/fps # 41.25 < t_prev
t4 = int((fnum-1)*vfps/fps)/vfps # 41.23333333 < t_prev
t5 = int(fnum*vfps/fps - 1)/vfps # 41.466666 > t_next
t6 = int((fnum-1)*vfps/fps + 1)/vfps # 41.26666 < t_prev
Command used:
# burning subtitles
# (previously)
# ffmpeg -r 25 -i nosub.mp4 -vf subtitles=sub.srt withsub.mp4
# now:
ffmpeg -i nosub.mp4 -vf subtitles=sub.srt withsub.mp4
# frames extraction
ffmpeg -i withsub.mp4 -vf fps=4 extracted/%05.bmp -hide_banner
Why does this happen and how can I solve this?
One thing I have noticed is that if I extract frames of the original video and the subtitle ones, do a difference of the frames, the result is not only the subtitles, there are variations in the background (that shouldn't happen). If I do the same experience using the same video two times, the difference is null, which means that the frame extraction is consistant.
Code for the difference:
ffmpeg -i withsub.mp4 -vf fps=4 extracted/%05.bmp -hide_banner
ffmpeg -i no_sub.mp4 -vf fps=4 extracted_no_sub/%05.bmp -hide_banner
for img in no_sub/*.bmp; do
convert extracted/${img##*/} $img -compose minus -composite diff/${img##*/}
You can extract frames with accurate timestamps, thus
ffmpeg -i nosub.mp4 -vf subtitles=sub.srt,settb=AVTB,select='if(eq(n\,0)\,1\,floor(4*t)-floor(4*prev_t))' -vsync 0 -r 1000 -frame_pts true extracted/%08d.bmp
This will extract the first frame from each quarter second. The output filename is 8 characters long where the first 5 digits are seconds and last three are milliseconds. You can change the field size based on max file duration.