I have an app that generates a bunch of jpgs that I need to turn into a webm video. I'm trying to get my rgb data from the jpegs into the vpxenc sample. I can see the basic shapes from the original jpgs in the output video, but everything is tinted green (even pixels that should be black are about halfway green) and every other scanline has some garbage in it.
I'm trying to feed it VPX_IMG_FMT_YV12 data, which I'm assuming is structured like so:
for each frame 8-bit Y data 8-bit averages of each 2x2 V block 8-bit averages of each 2x2 U block
Here is a source image and a screenshot of the video that is coming out:
It's entirely possible that I'm doing the RGB->YV12 conversion incorrectly, but even if I only encode the 8-bit Y data and set the U and V blocks to 0, the video looks about the same. I'm basically running my RGB data through this equation:
// (R, G, and B are 0-255)
float y = 0.299f*R + 0.587f*G + 0.114f*B;
float v = (R-y)*0.713f;
float u = (B-v)*0.565f;
.. and then to produce the 2x2 filtered values for U and V that I write into vpxenc, I just do (a + b + c + d) / 4, where a,b,c,d are the U or V values of each 2x2 pixel block.
So I'm wondering:
Is there an easier way (in code) to take RGB data and feed it to vpx_codec_encode to get a nice webm video?
Is my RGB->YV12 conversion wrong somewhere?
Any help would be greatly appreciated.
freefallr: Sure. Here is the code. Note that it's converting the RGB->YUV in place as well as putting the YV12 output into pFullYPlane/pDownsampledUPlane/pDownsampledVPlane. This code produced nice looking WebM videos when I modified their vpxenc sample to use this data.
void RGB_To_YV12( unsigned char *pRGBData, int nFrameWidth, int nFrameHeight, void *pFullYPlane, void *pDownsampledUPlane, void *pDownsampledVPlane )
{
int nRGBBytes = nFrameWidth * nFrameHeight * 3;
// Convert RGB -> YV12. We do this in-place to avoid allocating any more memory.
unsigned char *pYPlaneOut = (unsigned char*)pFullYPlane;
int nYPlaneOut = 0;
for ( int i=0; i < nRGBBytes; i += 3 )
{
unsigned char B = pRGBData[i+0];
unsigned char G = pRGBData[i+1];
unsigned char R = pRGBData[i+2];
float y = (float)( R*66 + G*129 + B*25 + 128 ) / 256 + 16;
float u = (float)( R*-38 + G*-74 + B*112 + 128 ) / 256 + 128;
float v = (float)( R*112 + G*-94 + B*-18 + 128 ) / 256 + 128;
// NOTE: We're converting pRGBData to YUV in-place here as well as writing out YUV to pFullYPlane/pDownsampledUPlane/pDownsampledVPlane.
pRGBData[i+0] = (unsigned char)y;
pRGBData[i+1] = (unsigned char)u;
pRGBData[i+2] = (unsigned char)v;
// Write out the Y plane directly here rather than in another loop.
pYPlaneOut[nYPlaneOut++] = pRGBData[i+0];
}
// Downsample to U and V.
int halfHeight = nFrameHeight >> 1;
int halfWidth = nFrameWidth >> 1;
unsigned char *pVPlaneOut = (unsigned char*)pDownsampledVPlane;
unsigned char *pUPlaneOut = (unsigned char*)pDownsampledUPlane;
for ( int yPixel=0; yPixel < halfHeight; yPixel++ )
{
int iBaseSrc = ( (yPixel*2) * nFrameWidth * 3 );
for ( int xPixel=0; xPixel < halfWidth; xPixel++ )
{
pVPlaneOut[yPixel * halfWidth + xPixel] = pRGBData[iBaseSrc + 2];
pUPlaneOut[yPixel * halfWidth + xPixel] = pRGBData[iBaseSrc + 1];
iBaseSrc += 6;
}
}
}