Live Video Encoding Using Intel Quick Sync and Opencv Is Slow

I want to develop one software that takes frames from 4K camera and encodes it simultaneously. And right-now I can get the frames and compress it to a .h264 file. However, The problem is, I want to get 10 FPS from my video.(I mean the video eventually becomes 10FPS, but while encoding process, it doesn't get all the frames.(I am getting in between 3-5 FPS).When I dig the code I have realized that, the encoding function is okay but the function that converts BGR->YUV is pretty slow. Also, in my case, there is one dedicated computer and camera for this software. That computer only has intel integrated GPU.(so, Using FFMPEG with NVIDIA GPU is not possible.) How could I make this faster? This is the code I am using: ` int main() {

Mat frame;
VideoCapture vcap(0);
vcap.set(3, 3840);
vcap.set(4, 2160);

if (!vcap.isOpened()) {
    std::cout << "Error opening video stream or file" << std::endl;
    return -1;
}


int frame_width = vcap.get(3);
int frame_height = vcap.get(4);
VideoWriter video;
std::cout << video.open("out.h264", CAP_INTEL_MFX, VideoWriter::fourcc('H', '2', '6', '4'), 10, Size(frame_width, frame_height), true);
for (;;) {

    vcap >> frame;
    video.write(frame);
}


return  -1;

The conversion function BGR->YUV takes about 0.20 seconds(~5 FPS). The encoding function takes about 0.045(~20FPS) seconds.

I was thinking the encoding would take time, but apparently conversion part takes a lot more, which is odd. There must be a solution for this.

GPU: Intel(R) UDH Graphics 620

CPU: Intel Core i5-8350U 1.70GHZ(8 Core)

Okay, so this is the write_one() function from OpenCV, it converts the frame to YUV and encodes it.This function is callled in every frame. The function named cvtBGRtoTwoPlaneYUV() does the conversion from BGR->YUV.(TAKES 200 MS )

    bool VideoWriter_IntelMFX::write_one(cv::InputArray bgr)
{
   
 mfxStatus res;
    mfxFrameSurface1 *workSurface = 0;
    mfxSyncPoint sync;
    clock_t start1 = clock();
    if (!bgr.empty() && (bgr.dims() != 2 || bgr.type() != CV_8UC3 || bgr.size() != frameSize))
    {
        MSG(cerr << "MFX: invalid frame passed to encoder: "
            << "dims/depth/cn=" << bgr.dims() << "/" << bgr.depth() << "/" << bgr.channels()
            << ", size=" << bgr.size() << endl);
        return false;

    }

    if (!bgr.empty())
    {
        workSurface = pool->getFreeSurface();
        if (!workSurface)
        {
            // not enough surfaces
            MSG(cerr << "MFX: Failed to get free surface" << endl);
            return false;
        }
        Mat src = bgr.getMat();
        hal::cvtBGRtoTwoPlaneYUV(src.data, src.step,
                                 workSurface->Data.Y, workSurface->Data.UV, workSurface->Data.Pitch,
                                 workSurface->Info.CropW, workSurface->Info.CropH,
                                 3, false, 1);
    }
    clock_t end1 = clock();
   
    clock_t start = clock();
    while (true)
    {
     
        outSurface = 0;
        DBG(cout << "Calling with surface: " << workSurface << endl);
        res = encoder->EncodeFrameAsync(NULL, workSurface, &bs->stream, &sync);
        if (res == MFX_ERR_NONE)
        {
            res = session->SyncOperation(sync, getWriterTimeoutMS()); // TODO: provide interface to modify timeout
            if (res == MFX_ERR_NONE)
            {
                // ready to write
                if (!bs->write())
                {
                    MSG(cerr << "MFX: Failed to write bitstream" << endl);
                    return false;
                }
                else
                {
                    DBG(cout << "Write bitstream" << endl);
                    /*RSI*/
                    clock_t end = clock();
                    frame_info[0] += (double(end - start) / CLOCKS_PER_SEC);
                    frame_info[2] += (double(end1 - start1) / CLOCKS_PER_SEC);
                    frame_info[1]++;
                    /*RSI*/
                    return true;
                }
            }
            else
            {
                MSG(cerr << "MFX: Sync error: " << res << endl);
                return false;
            }
        }
        else if (res == MFX_ERR_MORE_DATA)
        {
            DBG(cout << "ERR_MORE_DATA" << endl);
            return false;
        }
        else if (res == MFX_WRN_DEVICE_BUSY)
        {
            DBG(cout << "Waiting for device" << endl);
            sleep_ms(1000);
            continue;
        }
        else
        {
            MSG(cerr << "MFX: Bad status: " << res << endl);
            return false;
        }
      
    }
   
}

Solution

For anyone who is facing with the same issue. After deeper search I have found that intel also has a function does BGR->YUV conversion using hardware acceleration. It is fast comparing the above cvtBGRtoTwoPlaneYUV() function(~4x times faster). However, it was still slow for my use case. I had to get 30FPS using Intel(R) UDH Graphics 620, apparently the only solution for now is getting a better GPU. The samples for use case: https://github.com/sivabudh/intel-media-sdk-tutorials/tree/master/simple_6_encode_vmem_vpp_preproc

I just have added OpenCV X86 lib and I have written my own function that reads from CV::MAT rather than a file.