Search code examples
avfoundationvideo-processingavkitvideo-toolbox

Decoding ProresRAW format to native bayer representation


I am trying to decode Prores video file But it doesn't work. I always got

Optional(Error Domain=AVFoundationErrorDomain Code=-11821 "Cannot Decode" UserInfo={NSLocalizedFailureReason=The media data could not be decoded. It may be damaged., NSLocalizedDescription=Cannot Decode, NSUnderlyingError=0x600002a982a0 {Error Domain=NSOSStatusErrorDomain Code=-12137 "(null)"}})

Here a full decoder:

class Decoder {

private let assetReader: AVAssetReader?
private let output: AVAssetReaderTrackOutput

init() throws {
    
    VTRegisterProfessionalVideoWorkflowVideoDecoders()
    VTRegisterProfessionalVideoWorkflowVideoEncoders()
    
        let assetReader = try AVAssetReader(asset: movieAsset)
        let tracks = movieAsset.tracks(withMediaType: .video)
        
        guard let firstTrack = tracks.first else {
            print("No video tracks found")
            throw NSError()
        }
    
        let out = AVAssetReaderTrackOutput(track: firstTrack, outputSettings: outputSettings)
        out.alwaysCopiesSampleData = true
        
        assetReader.add(out)
        
        self.assetReader = assetReader
        self.output = out

   
}

func run(){
    
    guard let assetReader = assetReader, assetReader.startReading() else {
        print("Failed to stard asset reader")
        return
    }
    

    while(assetReader.status == .reading) {
        guard let sampleBuffer = output.copyNextSampleBuffer() else {
            print(assetReader.status.rawValue)
            print(assetReader.error)
            continue
        }
          
        print("Decoding success!")
    }
}

}


Solution

  • It's not clear why you want Bayer and I'm not sure what you mean by "native", but I guess you might want your data to be

    1. at its highest possible definition, or
    2. in its most natural / efficient / least processed format
    3. just Bayer don't ask me any more questions

    So there are two possibilities I think.

    If you like high definition data, try setting your AVAssetReaderTrackOutput pixel format to kCVPixelFormatType_444YpCbCr16VideoRange_16A_TriPlanar, kCVPixelFormatType_4444AYpCbCr16 or kCVPixelFormatType_64RGBALE or one of the other formats mentioned in the AVAssetReaderTrackOutput. I'd think the chances are good that AVAssetReader won't gratuitously truncate the data.

    I have no idea about natural or efficient representations when working with ProRes RAW, but if you really want Bayer output, you can set your outputSettings to nil and use a VTDecompressionSession to convert the raw sample buffers to kCVPixelFormatType_16VersatileBayer (or kCVPixelFormatType_64RGBAHalf, kCVPixelFormatType_128RGBAFloat if you're still into high range formats that AVAssetReader dislikes for some reason), but not kCVPixelFormatType_64RGBA_DownscaledProResRAW as that doesn't seem to work.

    Anyway, you could lightly modify your code to do decode to kCVPixelFormatType_16VersatileBayer like so:

    import AVFoundation
    import VideoToolbox
    
    class Decoder {
        private let assetReader: AVAssetReader?
        private let output: AVAssetReaderTrackOutput
        private var decompressionSession: VTDecompressionSession!
            
        init() throws {
            let movieUrl = URL(fileURLWithPath: "/Users/xxxx/ProresRAW_Video.MOV")
            let movieAsset = AVAsset(url: movieUrl)
            
            do {
                let assetReader = try AVAssetReader(asset: movieAsset)
                let tracks = movieAsset.tracks(withMediaType: .video)
                
                guard let firstTrack = tracks.first else {
                    print("No video tracks found")
                    throw NSError()
                }
                
                let out = AVAssetReaderTrackOutput(track: firstTrack, outputSettings: nil)
                out.alwaysCopiesSampleData = true
                
                assetReader.add(out)
                
                self.assetReader = assetReader
                self.output = out
                
            } catch {
                print(error)
                throw error
            }
            
        }
        
        func run(){
            guard let assetReader = assetReader, assetReader.startReading() else {
                print("Failed to stard asset reader")
                return
            }
            
            while(assetReader.status == .reading) {
                guard let sampleBuffer = output.copyNextSampleBuffer() else {
                    print(assetReader.status.rawValue)
                    print(assetReader.error)
                    continue
                }
                
                print("Decoding success! \(sampleBuffer)")
                
                if let formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer) {
                    if decompressionSession == nil {
                        let imageBufferAttributes: [CFString: Any] = [
                            kCVPixelBufferPixelFormatTypeKey: kCVPixelFormatType_16VersatileBayer
                        ]
                        var outputCallback = VTDecompressionOutputCallbackRecord(decompressionOutputCallback: { _, _, status, infoFlags, imageBuffer, presentationTimeStamp, presentationDuration in
                            assert(noErr == status)
                            print("decode callback status: \(status), bayer imageBuffer \(String(describing: imageBuffer)), flags: \(infoFlags), pts: \(presentationDuration), ptsd: \(presentationDuration)")
                        }, decompressionOutputRefCon: nil)
                        let status = VTDecompressionSessionCreate(allocator: nil, formatDescription: formatDescription, decoderSpecification: nil, imageBufferAttributes: imageBufferAttributes as CFDictionary, outputCallback: &outputCallback, decompressionSessionOut: &decompressionSession)
                        assert(noErr == status)
                    }
    
                    let status = VTDecompressionSessionDecodeFrame(decompressionSession, sampleBuffer: sampleBuffer, flags: [], frameRefcon: nil, infoFlagsOut: nil)
                    assert(noErr == status)
                }
            }
        }
    }
    

    The thing I don't get is why AVAssetReader, which probably uses VTDecompressionSession under the hood doesn't simply let you request kCVPixelFormatType_16VersatileBayer in the first place. Maybe it's bloody mindedness or maybe it doesn't make sense? p.s. what are you trying to do?