Search code examples
numpyfor-loopyuv

NV12 to YUV444 speed up


I have a code that converts image from nv12 to yuv444

for h in range(self.img_shape[0]):
    # centralize yuv 444 data for inference framework
    for w in range(self.img_shape[1]):
        yuv444_res[h][w][0] = (nv12_y_data[h * self.img_shape[1] +w]).astype(np.int8)
        yuv444_res[h][w][1] = (nv12_u_data[int(h / 2) * int(self.img_shape[1] / 2) +int(w / 2)]).astype(np.int8)
        yuv444_res[h][w][2] = (nv12_v_data[int(h / 2) * int(self.img_shape[1] / 2) +int(w / 2)]).astype(np.int8)

Since for loop is very slow in python, much slower than numpy. I was wondering if this conversion can be done in NumPy calculation.

Update on 06/15/2021:

I was able to get this piece of code with fancy indexing from this page External Link:

    yuv444 = np.empty([self.height, self.width, 3], dtype=np.uint8)
    yuv444[:, :, 0] = nv12_data[:self.width * self.height].reshape(
        self.height, self.width)
    u = nv12_data[self.width * self.height::2].reshape(
        self.height // 2, self.width // 2)
    yuv444[:, :, 1] = Image.fromarray(u).resize((self.width, self.height))
    v = nv12_data[self.width * self.height + 1::2].reshape(
        self.height // 2, self.width // 2)
    yuv444[:, :, 2] = Image.fromarray(v).resize((self.width, self.height))

    data[0] = yuv444.astype(np.int8)

If the PIL is used to replace the deprecated imresize, then the code match the old code 100%

Update on 06/19/2021:

After a closer look at the answer Rotem given, I realize that his way is quicker.

    #nv12_data is reshaped to one dimension
    y = nv12_data[:self.width * self.height].reshape(
        self.height, self.width)
    shrunk_u = nv12_data[self.width * self.height::2].reshape(
        self.height // 2, self.width // 2)
    shrunk_v = nv12_data[self.width * self.height + 1::2].reshape(
        self.height // 2, self.width // 2)
    u = cv2.resize(shrunk_u, (self.width, self.height),
                   interpolation=cv2.INTER_NEAREST)
    v = cv2.resize(shrunk_v, (self.width, self.height),
                   interpolation=cv2.INTER_NEAREST)
    yuv444 = np.dstack((y, u, v))

Also, I did a time comparison for processing 1000 pics. Turns out the cv reshape is quicker and guarantees the same result.

cv time: 4.417593002319336, pil time: 5.395732164382935

Update on 06/25/2021:

Pillow resize has different default resample param values in different versions.

5.1.0:

def resize(self, size, resample=NEAREST, box=None):

8.1.0:

def resize(self, size, resample=BICUBIC, box=None, reducing_gap=None):

It would be a good idea to specify the resample strategy used.


Solution

  • You may use the process described in my following post, in reverse order (without the RGB part).

    Illustration:
    enter image description here


    Start by creating a synthetic sample image in NV12 format, using FFmpeg (command line tool).
    The sample image is used for testing.

    Executing from Python using subprocess module:

    import subprocess as sp
    import shlex
    
    sp.run(shlex.split('ffmpeg -y -f lavfi -i testsrc=size=192x108:rate=1:duration=1 -vcodec rawvideo -pix_fmt nv12 nv12.yuv'))
    sp.run(shlex.split('ffmpeg -y -f rawvideo -video_size 192x162 -pixel_format gray -i nv12.yuv -pix_fmt gray nv12_gray.png'))
    

    Read the sample image, and executing the code from your post (used as reference):

    import numpy as np
    import cv2
    
    nv12 = cv2.imread('nv12_gray.png', cv2.IMREAD_GRAYSCALE)
    cols, rows = nv12.shape[1], nv12.shape[0]*2//3
    
    # Reference implementation - using for-loops (the solution is in the part below):
    ################################################################################
    nv12_y_data = nv12[0:rows, :].flatten()
    nv12_u_data = nv12[rows:, 0::2].flatten()
    nv12_v_data = nv12[rows:, 1::2].flatten()
    
    yuv444_res = np.zeros((rows, cols, 3), np.uint8)
    
    for h in range(rows):
        # centralize yuv 444 data for inference framework
        for w in range(cols):
            yuv444_res[h][w][0] = (nv12_y_data[h * cols + w]).astype(np.int8)
            yuv444_res[h][w][1] = (nv12_u_data[int(h / 2) * int(cols / 2) + int(w / 2)]).astype(np.int8)
            yuv444_res[h][w][2] = (nv12_v_data[int(h / 2) * int(cols / 2) + int(w / 2)]).astype(np.int8)
    
    ################################################################################
    

    My suggested solution applies the following stages:

    • Separate U and V into two "half size" matrices shrunk_u and shrunk_v.
    • Resize shrunk_u and shrunk_v to full image size matrices using cv2.resize.
      In my code sample I used nearest neighbor interpolation for getting the same result as your result.
      It is recommended to replace it with linear interpolation for better quality.
    • Use np.dstack for merging Y, U and V into YUV (3 color channels) image.

    Here is the complete code sample:

    import numpy as np
    import subprocess as sp
    import shlex
    import cv2
    
    sp.run(shlex.split('ffmpeg -y -f lavfi -i testsrc=size=192x108:rate=1:duration=1 -vcodec rawvideo -pix_fmt nv12 nv12.yuv'))
    sp.run(shlex.split('ffmpeg -y -f rawvideo -video_size 192x162 -pixel_format gray -i nv12.yuv -pix_fmt gray nv12_gray.png'))
    #sp.run(shlex.split('ffmpeg -y -f rawvideo -video_size 192x108 -pixel_format nv12 -i nv12.yuv -vcodec rawvideo -pix_fmt yuv444p yuv444.yuv'))
    #sp.run(shlex.split('ffmpeg -y -f rawvideo -video_size 192x324 -pixel_format gray -i yuv444.yuv -pix_fmt gray yuv444_gray.png'))
    #sp.run(shlex.split('ffmpeg -y -f rawvideo -video_size 192x108 -pixel_format yuv444p -i yuv444.yuv -pix_fmt rgb24 rgb.png'))
    #sp.run(shlex.split('ffmpeg -y -f rawvideo -video_size 192x108 -pixel_format gbrp -i yuv444.yuv -filter_complex "extractplanes=g+b+r[g][b][r],[r][g][b]mergeplanes=0x001020:gbrp[v]" -map "[v]" -vcodec rawvideo -pix_fmt rgb24 yuvyuv.yuv'))
    #sp.run(shlex.split('ffmpeg -y -f rawvideo -video#_size 576x108 -pixel_format gray -i yuvyuv.yuv -pix_fmt gray yuvyuv_gray.png'))
    
    nv12 = cv2.imread('nv12_gray.png', cv2.IMREAD_GRAYSCALE)
    cols, rows = nv12.shape[1], nv12.shape[0]*2//3
    
    nv12_y_data = nv12[0:rows, :].flatten()
    nv12_u_data = nv12[rows:, 0::2].flatten()
    nv12_v_data = nv12[rows:, 1::2].flatten()
    
    yuv444_res = np.zeros((rows, cols, 3), np.uint8)
    
    for h in range(rows):
        # centralize yuv 444 data for inference framework
        for w in range(cols):
            yuv444_res[h][w][0] = (nv12_y_data[h * cols + w]).astype(np.int8)
            yuv444_res[h][w][1] = (nv12_u_data[int(h / 2) * int(cols / 2) + int(w / 2)]).astype(np.int8)
            yuv444_res[h][w][2] = (nv12_v_data[int(h / 2) * int(cols / 2) + int(w / 2)]).astype(np.int8)
    
    y = nv12[0:rows, :]
    shrunk_u = nv12[rows:, 0::2].copy()
    shrunk_v = nv12[rows:, 1::2].copy()
    
    u = cv2.resize(shrunk_u, (cols, rows), interpolation=cv2.INTER_NEAREST)  # Resize U channel (use NEAREST interpolation - fastest, but lowest quality).
    v = cv2.resize(shrunk_v, (cols, rows), interpolation=cv2.INTER_NEAREST)  # Resize V channel
    
    yuv444 = np.dstack((y, u, v))
    
    is_eqaul = np.all(yuv444 == yuv444_res)
    print('is_eqaul = ' + str(is_eqaul))  # is_eqaul = True
    
    # Convert to RGB for display
    yvu = np.dstack((y, v, u))  # Use COLOR_YCrCb2BGR, because it's uses the corrected conversion coefficients.
    rgb = cv2.cvtColor(yvu, cv2.COLOR_YCrCb2BGR)
    
    # Show results:
    cv2.imshow('nv12', nv12)
    cv2.imshow('yuv444_res', yuv444_res)
    cv2.imshow('yuv444', yuv444)
    cv2.imshow('rgb', rgb)
    cv2.waitKey()
    cv2.destroyAllWindows()
    

    Input (NV12 displayed as Grayscale):
    enter image description here

    Output (after converting to RGB):
    enter image description here