ffmpeg-python drawtext after concatenating mp4 videos

I realize a similar question has been asked here, so before pointing to that question and answer, please understand that my question is different. I'm hoping that someone can point me in the right direction.

In short, I have 5 video segments that I'm concatenating and reordering based on user input. Is there a way that I can drawtext dynamically over the input without having to process the video a second time?

This is the code that I have which is working, but as you can see, I need to open the concatenated file and apply the text over that version. Then the file is saved as a duplicate.

I'm looking for a more elegant way to accomplish this. Any suggestions would be appreciated.

        video1 = ffmpeg.input('./assets/v_1.mp4')
        video2 = ffmpeg.input('./assets/v_2.mp4')
        video3 = ffmpeg.input('./assets/v_3.mp4')
        video4 = ffmpeg.input('./assets/v_4.mp4')
        video5 = ffmpeg.input('./assets/v_5.mp4')



        print(row)

        ## IF Row 1 and 2 have values they get all five.
        if row[1] == '1' and row[2] == '1':
            print("Matches here");
            outfile = row[0]+'.mp4'
            ##DO Stuff
            joined = ffmpeg.concat(video1.video,video1.audio,video2.video,video2.audio,video3.video,video3.audio,video4.video,video4.audio,video5.video,video5.audio, v=1,a=1,unsafe=1).node
            vj = joined[0]
            va = joined[1].filter('volume', 1)

            out = ffmpeg.output(vj,va, outfile)
            out.run()
            ## Once Concat Video is finished, then it draws text over the video. 
            input2 = ffmpeg.input(row[0]+'.mp4').drawtext(fontfile='/Users/jserra/Library/Fonts/Cocogoose-Condensed-Regular-trial.ttf',fontsize='60',timecode='00:00:00:00',r=60,text=row[0],fontcolor='black',escape_text=True)
            ffmpeg.output(input2,row[0]+'_1.mp4').run()

I've tried this and receive the following error:

video1 = ffmpeg.input('./assets/StMarys_1.mp4').drawtext(fontfile='/Users/jserra/Library/Fonts/Cocogoose-Condensed-Regular-trial.ttf',fontsize='60',timecode='00:00:00:00',r=60,text=row[0],fontcolor='black',escape_text=True)

Error:

    .virtualenvs/cvtesting/lib/python3.6/site-packages/ffmpeg/_run.py", line 93, in _allocate_filter_stream_names
    upstream_node, upstream_label
ValueError: Encountered drawtext(fontcolor='black', fontfile='/Users/jserra/Library/Fonts/Cocogoose-Condensed-Regular-trial.ttf', fontsize='60', r=60, text='jack', timecode='00:00:00:00') <1d2ff6bbf3f0> with multiple outgoing edges with same upstream label None; a `split` filter is probably required

I've also tried chaining it after the videos are concatenated with joined. I still receive errors.

joined = ffmpeg.concat(video1.video,video1.audio,video2.video,video2.audio,video3.video,video3.audio,video4.video,video4.audio,video5.video,video5.audio, v=1,a=1,unsafe=1).drawtext(fontfile='/Users/jserra/Library/Fonts/Cocogoose-Condensed-Regular-trial.ttf',fontsize='60',timecode='00:00:00:00',r=60,text=row[0],fontcolor='black',escape_text=True).node

Will I need to process these videos twice? If there are any optimizations that I can make please let me know. Also, if there are any pointers about displaying the drawn text for a certain period of time, the documentation seems kinda spotty as it relates to controlling the duration, I'm not sure what the values mean or how they impact each other.

Thanks

Solution

Ok, so for those that experience issues following examples that aren't a 1:1 translation, this is what I've realized.

If I apply the drawtext filter to the video that is returned in joined[0], I can add the text in at the right spot without encoding or processing the video twice.

I'm assuming this has to do with the fact that drawtext can only be applied to videos and not audio (which makes sense).

## IF Row 1 and 2 have values they get all five.
        if row[1] == '1' and row[2] == '1':
            print("Matches here");
            outfile = row[0]+'.mp4'

            ##DO Stuff
            joined = ffmpeg.concat(video1.video,video1.audio,video2.video,video2.audio,video3.video,video3.audio,video4.video,video4.audio,video5.video,video5.audio, v=1,a=1,unsafe=1).node

            print(type(joined))
            print(joined);
            vj = joined[0].drawtext(fontfile='/Users/js/Library/Fonts/Cocogoose-Condensed-Regular-trial.ttf',fontsize='600',x=100,y=10,text=row[0],fontcolor='white',escape_text=True,enable='between(t,1,2.5)')
            va = joined[1].filter('volume', 1)

            out = ffmpeg.output(vj,va, outfile)
            out.run()

I'm sure this isn't a thorough explaination, but it appears to be what makes sense in light of the test that I've run. In my first example outlining the problem, audio was removed from the video after it was processed the second time. This is what gave me the idea to apply the drawtext filter to only the video returned in joined[0] since it appeared to be competing with the audio.