So, I'm trying to learn basics of Core Media since I need to process real time audio samples in my app. For now I know that I need to configure an AVCaptureSession
setting an AVCaptureDevice
used to acquire samples and an AVCaptureDataOutput
that processes the input from the device and "notifies" an AVCaptureAudioDataSampleBufferDelegate
through a captureOutput(...)
method.
Now, this one method gets passed the samples as an CMSampleBuffer
object, that according to Apple's CM documentation, will contain zero or more media (audio in my case) samples and a CMBlockBuffer
, that is
[...] a CFType object that represents a contiguous range of data offsets [...] across a possibly noncontiguous memory region.
OK So this is kinda getting confusing. I'm not a native speaker and I'm struggling to understand what this is supposed to mean. Why do I need this to access my samples? Aren't they stored as an array of raw binary data (therefore homogeneous and contiguous)? I guess this is related to how the underlying memory is managed by Core Media but I can't figure it out.
Also the last batch of samples gets accessed through this CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer
method which expects an unsafe mutable pointer to an AudioBufferList
and one to an optional CMBlockBuffer
. The first one will be filled with pointers into the latter, and then I may (or may not) be able to access the samples through myAudioBufferList.mBuffers.mData
, which might be nil
.
Example code from Apple Developers code snippets:
public func captureOutput(_ output: AVCaptureOutput,
didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection) {
var audioBufferList = AudioBufferList()
var blockBuffer: CMBlockBuffer?
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
sampleBuffer,
bufferListSizeNeededOut: nil,
bufferListOut: &audioBufferList,
bufferListSize: MemoryLayout.stride(ofValue: audioBufferList),
blockBufferAllocator: nil,
blockBufferMemoryAllocator: nil,
flags: kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
blockBufferOut: &blockBuffer)
guard let data = audioBufferList.mBuffers.mData else {
return
}
}
What's the memory model (or pipeline) behind this? I truly appreciate any help.
Aren't they stored as an array of raw binary data (therefore homogeneous and contiguous)?
No. They're explicitly not stored this way, which is why you must use CMBlockBuffer to work with them. You can force them into a contiguous block if needed using withUnsafeMutableBytes
, but this may force a copy to occur, and there is no promise that the pointer that the closure receives is valid outside of the closure.
Allocating and copying memory are expensive operations. In real-time audio, it is best to avoid them whenever possible. CMBlockBuffer can stitch together existing blocks of memory without needing to allocate a single large block and then copying. It can just use pointers to all the exiting blocks.
Retaining means increasing the reference count on a buffer. Multiple objects may reference the same blocks without requiring copying. This is a reference counted system. Retains increase the count and releases decrease it. When it reaches zero, the block of memory becomes available for reuse or is deallocated.
Most of the time Swift programmers don't need to really interact with this retain/release system, other than avoiding retain cycles. The details are generally handled automatically by ARC. But it can be useful to understand in performance-critical contexts. For details, see Memory Management Programming Guide for Core Foundation. (Core Media types are generally Core Foundation types and follow the Core Foundation rules which are very slightly different from the Cocoa/ObjC rules.)
See CMMemoryPool for an example of how this can be used.