Search code examples
pythonvideogstreamerpython-gstreamergnonlin

cut parts of a video using gstreamer/Python (gnonlin?)


I have a video file and I'd like to cut out some scenes (either identified by a time position or a frame). As far as I understand that should be possible with gnonlin but so far I wasn't able to find a sample how to that (ideally using Python). I don't want to modify the video/audio parts if possible (but conversion to mp4/webm would be acceptable).

Am I correct that gnonlin is the right component in the gstreamer universe to do that? Also I'd be glad for some pointers/recipes how to approach the problem (gstreamer newbie).


Solution

  • Actually it turns out that "gnonlin" is too low-level and still requires a lot of gstreamer knowledge. Luckily there is "gstreamer-editing-services" (gst-editing-services) which is a library offering a higher level API on top of gstreamer and gnonlin.

    With a tiny bit of RTFM reading and a helpful blog post with a Python example I was able to solve my basic problem:

    1. Load the asset (video)
    2. Create a Timeline with a single layer
    3. add the asset multiple times to the layer, adjusting start, inpoint and duration so only the relevant parts of a video are present in the output video

    Most of my code is directly taken from the referenced blog post above so I don't want to dump all of that here. The relevant stuff is this:

        asset = GES.UriClipAsset.request_sync(source_uri)
        timeline = GES.Timeline.new_audio_video()
        layer = timeline.append_layer()
    
        start_on_timeline = 0
        start_position_asset = 10 * 60 * Gst.SECOND
        duration = 5 * Gst.SECOND
        # GES.TrackType.UNKNOWN => add every kind of stream to the timeline
        clip = layer.add_asset(asset, start_on_timeline, start_position_asset,
            duration, GES.TrackType.UNKNOWN)
    
        start_on_timeline = duration
        start_position_asset = start_position_asset + 60 * Gst.SECOND
        duration = 20 * Gst.SECOND
        clip2 = layer.add_asset(asset, start_on_timeline, start_position_asset,
            duration, GES.TrackType.UNKNOWN)
        timeline.commit()
    

    The resulting video includes the segments 10:00–10:05 and 11:05-11:25 so essentially there are two cuts: One in the beginning and one in the middle.

    From what I have seen this worked perfectly fine, audio and video in sync, no worries about key frames and whatnot. The only part left is to find out if I can translate the "frame number" into a timing reference for gst editing services.