Search code examples
algorithmmathcolorsaggregatemedian

Aggregating HSL Values


I have written a piece of code that takes an input HSL color value, and categorizes it as one of eight predefined colors. Because the color of the object I'm measuring is not perfectly "smooth" (the exact H, S, and L values vary pixel to pixel, but there is a limited range each can fall into depending on the color), I want to aggregate the H's, S's, and L's of several pixels on the object, before identifying the resulting HSL value as a particular color.

To give an example, if the object is in fact black, then the H of any of its pixels should be in the range 24 - 33, while the H range is 33 - 37 for yellow. Aggregating several measured H's, instead of relying on a single measurement, will tend to yield a result closer to the middle of whichever range corresponds to the correct color, which desirably reduces the likelihood of needing to interpret the ambiguous 33 case.

I have been using something similar to median as my aggregation algorithm (exact algorithm shown below), but I've come across one case where this doesn't work well. In particular, the H range for a purple object includes both 231 - 240 and 0 - 10 (240 is the maximum H value, so it wraps around). All other colors' possible H's, S's, and L's are single, continuous ranges, for which the median approach (or my modified version) works well, but it isn't ideal in the purple H case, because it yields results which are actually closer to the edge of its range, so the input HSL value is more likely to be mistaken for another color (red's H range is 9 - 14).

Is there a better aggregation algorithm than median for this task, one that will tend to produce results nearer the middle of a color's H, S, and L ranges, even in the wrap around purple H case?

The algorithm:

private hslColor aggregateEachAttribute(hslColor[] hslData)
{
    List<double> hAttributes = new List<double>();
    List<double> sAttributes = new List<double>();
    List<double> lAttributes = new List<double>();

    for (int i = 0; i < hslData.Length; i++)
    {
        hAttributes.Add(hslData[i].H);
    }

    for (int i = 0; i < hslData.Length; i++)
    {
        sAttributes.Add(hslData[i].S);
    }

    for (int i = 0; i < hslData.Length; i++)
    {
        lAttributes.Add(hslData[i].L);
    }

    hAttributes.Sort();
    sAttributes.Sort();
    lAttributes.Sort();

    while (hAttributes.Distinct().Count() >= 3)
    {
        hAttributes.RemoveAll(h => h == hAttributes[0]);
        hAttributes.RemoveAll(h => h == hAttributes[hAttributes.Count() - 1]);
    }

    while (sAttributes.Distinct().Count() >= 3)
    {
        sAttributes.RemoveAll(s => s == sAttributes[0]);
        sAttributes.RemoveAll(s => s == sAttributes[sAttributes.Count() - 1]);
    }

    while (lAttributes.Distinct().Count() >= 3)
    {
        lAttributes.RemoveAll(l => l == lAttributes[0]);
        lAttributes.RemoveAll(l => l == lAttributes[lAttributes.Count() - 1]);
    }

    return new hslColor(hAttributes[0], sAttributes[0], lAttributes[0]);
}

Solution

  • If you just have wrap-around to worry about, and your points are either < X or > 240-X for some reasonably small X, you could calculate a median by adding X to your data mod 240, calculating a median, and then subtracting X from the result mod 240.

    More generally, you could look for a median or medoid by finding the value that minimizes a sum of distances, where d(x, y) = min(|x-y|, |240 + x - y|, |240 + y - x|). I think you could calculate this in time O(n) after a sort that takes O(n log n) with some rather fiddly code in which you calculate the sum of distances associated with each candidate median or mediod, dividing the points into two sets according to whether they are clockwise or anticlockwise from the current point and updating the sum of distances incrementally as you move from one candidate median or medoid to the next.