CVPixelBufferLockBaseAddress(pixelBuffer, 0);
const size_t lumaPlaneIndex = 0;
size_t lumaPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, lumaPlaneIndex);
size_t lumaPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer, lumaPlaneIndex);
const size_t cbcrPlaneIndex = 1;
size_t cbcrPlaneWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, cbcrPlaneIndex);
size_t cbcrPlaneHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer, cbcrPlaneIndex);
NSLog(@"lumaPlaneWidth: %zu", lumaPlaneWidth);
NSLog(@"lumaPlaneHeight: %zu", lumaPlaneHeight);
NSLog(@"cbcrPlaneWidth: %zu", cbcrPlaneWidth);
NSLog(@"cbcrPlaneHeight: %zu", cbcrPlaneHeight);
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
Output on my iPhone 5 running iOS 7 for the front facing camera is:
lumaPlaneWidth: 1280
lumaPlaneHeight: 720
cbcrPlaneWidth: 640
cbcrPlaneHeight: 360
The luma (Y) plane is twice the size of the CbCr plane, why is this?
The human eye is much more sensitive to changes in brightness than to changes in colour. It can discern them at a higher frequency, so that information is usually stored at a higher sampling frequency. The motivation is simply the reality of human perception (plus, I guess, some consideration of bandwidth: you'd just capture as much as physically possible if data transmission were free).
The buffer you're getting has the Y (/brightness) channel sampled at four times the sampling rate of the Cb and Cr (/colour) channels. That's 4:1:1 chroma subsampling.
Furthermore, 99.9999% of digital cameras capture using a colour filter (almost always a Bayer filter specifically) which means they don't actually capture the full colour at every site but rather capture individual primary components at adjoining sites and then combine them mathematically. That problem gets non-trivial if you want a really good estimate of the true signal. If you're expecting someone only to need 4:1:1 then it's cheaper to demosaic directly to 4:1:1. That's why the API isn't giving you 4:4:4, regardless of it not knowing what you intend to do with the data.