python opencv deep-learning computer-vision classification

How to detect object color efficiently?

I'm trying to detect a object's color in an efficient way. Let's assume I run a YOLO model and crop the object region given the bounding boxes. Given the cropped object image, then what's the most efficient and accurate way to detect the color of the object?

Previously, I trained a YOLO model to detect the color (10 class of colors), but running 2 deep learning models is too slow for my real-time requirements. I need the color detection/classification part to be very fast, preferably not using deep learning. Maybe pure Python or OpenCV or whatnot.

I wrote this piece of code that resizes the image to a 1x1 pixel. I then visualize the color in a square. But it's not accurate at all. Just too off.

from PIL import Image, ImageDraw

def get_dominant_color(pil_img):
    img = pil_img.copy()
    img = img.convert("RGBA")
    img = img.resize((1, 1), resample=0)
    dominant_color = img.getpixel((0, 0))
    return dominant_color

# Specify the path to your image
image_path = "path/to/your/image.jpg"

# Open the image using PIL
image = Image.open(image_path)

# Get the dominant color
dominant_color = get_dominant_color(image)

# Print the color in RGB format
print("Dominant Color (RGB):", dominant_color[:3])

# Create a new image with a 100x100 square of the dominant color
square_size = 100
square_image = Image.new("RGB", (square_size, square_size), dominant_color[:3])

# Display the square image
square_image.show()

Solution

Let me provide a summary of all the following voluminous content of my answer in one single sentence:

There is no coding solution to expectations based on wrong assumptions.

In other words your question "How to detect car color efficiently?" is based on a not true assumption that it is possible to detect car color without further segmentation of the image and a deep analysis of entire areas of the single extracted segments along with the relationships between these areas.

If you want try to get a coding solution anyway, I suggest an "upside down" approach to what you want to achieve consisting of following steps:

define the exact rgb() values of all car colors you want to detect in the images. For example, as you suggested in a comment, the rgb() values of 12 colors: red, white, black, silver, gray, yellow, orange, blue, pink, brown, beige and green.
define a function which can be used to decide if a color is similar to a given one
make sure that in the list of colors you use there isn't a pair of colors which are considered similar when using the color similarity function. In case of the mentioned set make for example the pairs (gray,silver), (white,beige), (beige,yellow) clearly distinguishable.
decide about the threshold of the pixel amount in the image which need to be similar to a given color from the list in order to consider it as the car color
adjust the overall brightness of the image to a pre-defined level in order to make the color similarity function return better results. Consider to take the entire image and not only the bounding box with the detected car in it to decide about the brightness adjustment.
run once through the pixels of sized down image and collect the amounts of pixels similar to colors in the list up to the point where the amount exceeds the threshold of being the color of the car
tune the factor for scaling down, brightness adjustment, the rgb() values in the list of colors, the color similarity function and the threshold of the amount of pixels with similar colors in order to arrive at a solution satisfying both your intuitive color perception and the image processing speed.

As you arrive at tuning you will probably notice why A10 in the comment to your question speaks about a hard problem. Is a white car not a gray car at very cloudy weather short before the sun goes down?

To cover your color intuition you gain from analysis of all the image details and image content (you see in the image if the sun shines or if it is cloudy or dark and adjust your color criteria intuitively to this like taking for example thrown shadows and car window areas out of consideration) you will need much more than a fast and simple color comparison. In other words, you will probably realize that there is no way around training a learning model on gigabytes of images and no way around using faster hardware to achieve higher processing speed.

In order to dive a bit behind the reasons for your unrealistic expectations I suggest you read about the silver color for example at Wikipedia . Here an excerpt:

The visual sensation usually associated with the metal silver is its metallic shine. This cannot be reproduced by a simple solid color because the shiny effect is due to the material's brightness varying with the surface angle to the light source.

from which you can see, that it is impossible to identify a silver car color, because silver is not an R,G,B color of an image pixel, but the result of the overall impression of the entire color surface under the lighting conditions perceived from the context of all the details in the image.

Notice that considering the above revelation the actual purpose of the suggested step of:

define the exact rgb() values of all car colors you want to detect in the images. For example, as you suggested in a comment, the rgb() values of 12 colors: red, white, black, silver, gray, yellow, orange, blue, pink, brown, beige and green.

is to make you aware of the limitations of the proposed approach and the fact that you won't be able to define silver, white and gray colors in a way which will go along with the with the human eye perceived car color.

By the way: it could be helpful to arrive at better results compared to the code you posted in your question cropping the images to get rid of the frames indicating the cars and then split the images into two halves, an upper and a lower one, and then decide about the result comparing the result for both of the images. Splitting to upper and lower parts can help to eliminate the impact of large dark car windows and detect inconsistencies in the color detection. Best would be to have a detection able to provide precise contour not only the rectangle - this would eliminate the noise because of the road surface and else surrounding. Depending on results from evaluating a huge amount of images you may decide to divide the images in vertical direction in three parts, not only two to improve the results. Notice also that effectivity of dividing the image into strips depends on the camera perspective, so for the white and yellow car simple vertical splitting will be sufficient, but in case of the silver and red car vertical splitting along with cropping a parallelogram will give better results. In other words the camera perspective should be a parameter evaluated by the method obtaining the car color.