Ways to do inter-frame video compression in AVFoundation

I've created a process to generate video "slideshows" from collections of photographs and images in an application that I'm building. The process is functioning correctly, but creates unnecessarily large files given that any photographs included in the video repeat for 100 to 150 frames unchanged. I've included whatever compression I can find in AVFoundation, which mostly applies intra-frame techniques and tried to find more information on inter-frame compression in AVFoundation. Unfortunately, there are only a few references that I've been able to find and nothing that has let me get it to work.

I'm hoping that someone can steer me in the right direction. The code for the video generator is included below. I've not included the code for fetching and preparing the individual frames (called below as self.getFrame()) since that seems to be working fine and gets quite complex since it handles photos, videos, adding title frames, and doing fade transitions. For repeated frames, it returns a structure with the frame image and a counter for the number of output frames to include.

        // Create a new AVAssetWriter Instance that will build the video

        assetWriter = createAssetWriter(path: filePathNew, size: videoSize!)
        guard assetWriter != nil else
            print("Error converting images to video: AVAssetWriter not created.")
            inProcess = false

        let writerInput = assetWriter!.inputs.filter{ $0.mediaType == AVMediaTypeVideo }.first!

        let sourceBufferAttributes : [String : AnyObject] = [
            kCVPixelBufferPixelFormatTypeKey as String : Int(kCVPixelFormatType_32ARGB) as AnyObject,
            kCVPixelBufferWidthKey as String : videoSize!.width as AnyObject,
            kCVPixelBufferHeightKey as String : videoSize!.height as AnyObject,
            AVVideoMaxKeyFrameIntervalKey as String : 50 as AnyObject,
            AVVideoCompressionPropertiesKey as String : [
                AVVideoAverageBitRateKey: 725000,
                AVVideoProfileLevelKey: AVVideoProfileLevelH264Baseline30,
                ] as AnyObject

        let pixelBufferAdaptor = AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: writerInput, sourcePixelBufferAttributes: sourceBufferAttributes)

        // Start the writing session


        assetWriter!.startSession(atSourceTime: kCMTimeZero)

        if (pixelBufferAdaptor.pixelBufferPool == nil) {
            print("Error converting images to video: pixelBufferPool nil after starting session")
            inProcess = false

        // -- Create queue for <requestMediaDataWhenReadyOnQueue>

        let mediaQueue = DispatchQueue(label: "mediaInputQueue")

        // Initialize run time values

        var presentationTime = kCMTimeZero
        var done = false
        var nextFrame: FramePack?                // The FramePack struct has the frame to output, noDisplays - the number of times that it will be output
                                                 // and an isLast flag that is true when it's the final frame

        writerInput.requestMediaDataWhenReady(on: mediaQueue, using: { () -> Void in    // Keeps invoking the block to get input until call markAsFinished

            nextFrame = self.getFrame()          // Get the next frame to be added to the output with its associated values
            let imageCGOut = nextFrame!.frame    // The frame to output
            if nextFrame!.isLast { done = true } // Identifies the last frame so can drop through to markAsFinished() below

            var frames = 0                       // Counts how often we've output this frame
            var waitCount = 0                    // Used to avoid an infinite loop if there's trouble with writer.Input

            while (frames < nextFrame!.noDisplays) && (waitCount < 1000000)  // Need to wait for writerInput to be ready - count deals with potential hung writer
                waitCount += 1
                if waitCount == 1000000     // Have seen it go into 100s of thousands and succeed
                    print("Exceeded waitCount limit while attempting to output slideshow frame.")
                    self.inProcess = false

                if (writerInput.isReadyForMoreMediaData)
                    waitCount = 0
                    frames += 1

                            if  let pixelBufferPool = pixelBufferAdaptor.pixelBufferPool
                                let pixelBufferPointer = UnsafeMutablePointer<CVPixelBuffer?>.allocate(capacity: 1)
                                let status: CVReturn = CVPixelBufferPoolCreatePixelBuffer(

                                if let pixelBuffer = pixelBufferPointer.pointee, status == 0
                                    CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))
                                    let pixelData = CVPixelBufferGetBaseAddress(pixelBuffer)
                                    let rgbColorSpace = CGColorSpaceCreateDeviceRGB()

                                    // Set up a context for rendering using the PixelBuffer allocated above as the target

                                    let context = CGContext(
                                        data: pixelData,
                                        width: Int(self.videoWidth),
                                        height: Int(self.videoHeight),
                                        bitsPerComponent: 8,
                                        bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer),
                                        space: rgbColorSpace,
                                        bitmapInfo: CGImageAlphaInfo.premultipliedFirst.rawValue

                                    // Draw the image into the PixelBuffer used for the context

                                    context?.draw(imageCGOut, in: CGRect(x: 0.0,y: 0.0,width: 1280, height: 720))

                                    // Append the image (frame) from the context pixelBuffer onto the video file

                                    _ = pixelBufferAdaptor.append(pixelBuffer, withPresentationTime: presentationTime)
                                    presentationTime = presentationTime + CMTimeMake(1, videoFPS)

                                    // We're done with the PixelBuffer, so unlock it

                                    CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))

                                pixelBufferPointer.deallocate(capacity: 1)

                            } else {
                                NSLog("Error: Failed to allocate pixel buffer from pool")

Thanks in advance for any suggestions.


  • It looks like you're

    1. appending a bunch of redundant frames to your video,
    2. labouring under a misapprehension: that video files must have a constant framerate that is high, e.g. 30fps.

    If, for example, you're showing a slideshow of 3 images over a duration of 15 seconds, then you need only output 3 images, with presentation timestamps of 0s, 5s, 10s and an assetWriter.endSession(atSourceTime:) of 15s, not 15s * 30 FPS = 450 frames .

    In other words, your frame rate is way too high - for the best interframe compression money can buy, lower your frame rate to the bare minimum number of frames you need and all will be well*.

    *I've seen some video services/players choke on unusually low framerates,
    so you may need a minimum framerate and some redundant frames, e.g. 1frame/5s, ymmv