Search code examples
pythontensorflowimage-processingcolor-spacetensorflow-io

What is the difference between tfio.experimental.color.rgb_to_ycbcr and tf.image.rgb_to_yuv?


For a computer vision project I want to compare the performance of my model programmed in TensorFlow on images in different colour spaces. I am specifically interested in the YCbCr colour space as it has the potential to express skin colours better than RGB, but I struggle to choose the correct function to transform the images inside of the data preprocessing pipeline.

The TensorFlow I/O addon library provides the function [rgb_to_ycbcr()](https://github.com/tensorflow/io/blob/master/tensorflow_io/python/experimental/color_ops.py) that I believe does what I want it to do, but I prefer to use a function in a non-experimental state if possible.

The main TensorFlow library provides the function [rgb_to_yuv()](https://www.tensorflow.org/api_docs/python/tf/image/rgb_to_yuv). As far as I have learned from wikipedia and other stackoverflow questions, YUV and YCbCr colour spaces are the same, or at least related (one being the analog version and one being the digital version) but I can't find a satisfying answer to the question online to how the colour spaces relate exactly and what that means for the equivalence or lack thereof of these functions.

Therefore, my question: Are these functions the same or are they different and why? Any general information on the differences/similarities between YCbCr and YUV are also welcome!

I've tried the following code to test the functions:

from PIL import Image
from io import BytesIO
import numpy as np

out = BytesIO()

with Image.open("camphoto_33463914.png") as img:
    img = img.resize((256, 256))
    img.save(out, format="png")

image_in_bytes = out.getvalue()

vector_bytes_str = str(image_in_bytes)
vector_bytes_str_enc = vector_bytes_str.encode()
bytes_np_dec = vector_bytes_str_enc.decode('unicode-escape').encode('ISO-8859-1')[2:-1]

image = cv2.cvtColor(
                cv2.imdecode(
                    np.frombuffer(bytes_np_dec, np.uint8),
                    cv2.IMREAD_COLOR,
                ).reshape(256, 256, 3),
                cv2.COLOR_BGR2RGB,
            )
image_tensor_rgb = tf.convert_to_tensor(image, dtype=tf.uint8)

image_tensor_ycbcr = tfio.experimental.color.rgb_to_ycbcr(image_tensor_rgb)

# The function didn't take the image with dtype tf.uint8
image_tensor_yuv = tf.image.rgb_to_yuv(tf.cast(image_tensor, dtype=tf.float32))

The following image shows the output of image_tensor_ycbcr.

The output of image_tensor_ycbcr

And the following image shows the output of image_tensor_yuv (in float because that's what the function wanted)

The output of image_tensor_yuv

As is visible , the first dimension of both outputs are the same beside some rounding differences likely coming from the difference in dtype. The second and third dimensions are quite different though.

When I plot both images with matplotlib.pyplot.imshow() they end up quite different, but this could be attributed to the fact that it is clipping the values of image_tensor_yuv to be within the [0..255] range. When I plot the individual dimenions of both image_tensor_ycbcr and image_tensor_yuv, they do look the exact same.


Solution

  • YCbCr and YUV are usually used interchangeably, in some cases YUV is a shortcut for YCbCr, and in some cases YUV refers a wider definition (let assume that YCbCr standards are subset of YUV standards).
    Note that YUV (and YCbCr) applies multiple standards, with different conversion formulas - each standard may have different conversion matrix, and may have different offsets, and different ranges.

    Many times, the exact standard is not well documented, and we have to figure out the standard from the code.


    tf.image.rgb_to_yuv implementation:

    Following the source code, we can see that tf.image.rgb_to_yuv uses the following conversion matrix:

    _rgb_to_yuv_kernel = [[0.299, -0.14714119, 0.61497538],
                          [0.587, -0.28886916, -0.51496512],
                          [0.114, 0.43601035, -0.10001026]]
    

    tf.image.rgb_to_yuv method uses the conversion formula defined in Wikipedia under SDTV with BT.470 (the matrix is transposed).

    The main characteristics of rgb_to_yuv are:

    • Y, U and V have no offsets, and no rounding.
    • Y range is [0, 255].
    • U and V range is about [-128, 127] (there are negative values).
    • The output values are not rounded.

    The above characteristics makes rgb_to_yuv conversion adequate for computer vision algorithms (where the offsets complicates things).


    tfio.experimental.color.rgb_to_ycbcr:

    Following the source code we can see that rgb_to_ycbcr uses rgb_to_ypbpr conversion and has offsets of [16, 128, 128].

    rgb_to_ycbcr BT.601 conversion formula and with offsets that applies "limited range" of BT.601.

    The main characteristics of rgb_to_ycbcr are:

    • Y, U and V have offsets (the offsets are [16, 128, 128]).
    • Y range is [16, 235].
    • U and V range is [16, 240].
    • The output values are rounded.

    The above characteristics makes rgb_to_ycbcr conversion adequate for video encoding (input for H.264 video encoder for example).


    Conclusion:
    In case your goal is video encoding, use rgb_to_ycbcr.
    In case your goal is image processing / computer vision algorithms use rgb_to_yuv.