Search code examples
c++windowsh.264hlslms-media-foundation

Convert colors from RGB to NV12


I’m working on an app that encodes video with media foundation h264 encoder. Sink writer crashes on Windows 7 with RGB input in VRAM, saying "0x8876086C D3DERR_INVALIDCALL" so I’ve implemented my own RGB->NV12 conversion on GPU, saving more than 60% of PCI express bandwidth.

Here’s what in my media types, both input (NV12) and output (h264):

mt->SetUINT32( MF_MT_VIDEO_CHROMA_SITING, MFVideoChromaSubsampling_MPEG2 ); // Specifies the chroma encoding scheme for MPEG-2 video. Chroma samples are aligned horizontally with the luma samples, but are not aligned vertically. The U and V planes are aligned vertically.
mt->SetUINT32( MF_MT_YUV_MATRIX, MFVideoTransferMatrix_BT709 ); // ITU-R BT.709 transfer matrix.
mt->SetUINT32( MF_MT_VIDEO_NOMINAL_RANGE, MFNominalRange_0_255 ); // The normalized range [0...1] maps to [0...255] for 8-bit samples or [0...1023] for 10-bit samples.
mt->SetUINT32( MF_MT_TRANSFER_FUNCTION, MFVideoTransFunc_10 );  // Linear RGB (gamma = 1.0).

The best result so far I have with this formula:

inline float3 yuvFromRgb(float3 rgba)
{
    float3 res;
    res.x = dot( rgba, float3( 0.182585880, 0.614230573, 0.0620070584 ) );
    res.y = dot( rgba, float3( -0.121760942, -0.409611613, 0.531372547 ) );
    res.z = dot( rgba, float3( 0.531372547, -0.482648790, -0.0487237722 ) );
    res += float3( 0.0627451017, 0.500000000, 0.500000000 );
    return saturate( res );
}

What worries me is the formula contradicts everything I’ve read on the internet, code samples, and official ITU specs.

For Y the formula’s fine, I took BT.709 coefficients, and scaled them linearly to map [0..255] into [16..235] as written in the spec. The brightness is OK.

The specs say I must scale U and V to map from [0..255] into [16..240]. My eyes, however, tell me it’s undersaturated. For correct colors I have to scale U & V the other way, from [0..255] into something like [-8, 255 + 8].

Why do I need to scale the other way to achieve correct colors after h264 encoding & decoding? Will this code work on other people’s computers?


Solution

  • The problem was chroma sampling artifacts. When I asked the question, I was looking at colored console text.

    Today I’ve tried encoding better image, this one: enter image description here With that image, it became obvious the correct formula is what’s specified in these standards.

    So, here’s the correct coefficients:

    // Convert RGB color into ITU-R BT.709 YUV color
    inline float3 yuvFromRgb( float3 rgb )
    {
        float3 res;
        res.x = dot( rgb, float3( 0.18258588, 0.61423057, 0.06200706 ) );
        res.y = dot( rgb, float3( -0.10064373, -0.33857197, 0.43921569 ) );
        res.z = dot( rgb, float3( 0.43921569, -0.39894217, -0.04027352 ) );
        res += float3( 0.06274510, 0.50196081, 0.50196081 );
        return res;
    }
    

    They still give me off-by-one error but for my particular problem, that 0.39% error is acceptable.