Search code examples
colorsaccessibilitywcagperception

Why does the WCAG contrast formula use the luminance and not the perceived lightness?


The WCAG formula/algorithm for determining the contrast between two colors uses the luminance of these colors. That is: sRGB is converted into linear RGB, then the different channels are multiplied with three weights (R 0.2126, G 0.7152, B 0.0722) and then added up to get the luminance. (In the document it's called relative luminance.) This is also the exact same formula how to get the Y channel when converting sRGB to the CIE XYZ color space. The calculated luminance is then just plugged into a simple formula ((L1 + 0.05) / (L2 + 0.05)) to get the contrast.

However, the luminance is not the perceived lightness of a color, at least according to this answer and also the Lch/Lab color space. Here, after calculating the luminance Y, it is then converted to the perceived lightness L* by using another non-linear formula. This is, to my knowledge, also the same way to get the L component while converting sRGB into the Lch/Lab color space.

I don't understand: Why is the WCAG using the luminance instead of the perceived lightness? If the WCAG contrast should reflect the human-perceived contrast between two colors, then the perceived lightness should be used, right?


Here I created 50 color pairs by simply using evenly spaced Lch colors. Both colors of the pairs are 1/50 L apart. I.e. the color pairs:

  • lch(0% 0 0) and lch(2% 0 0)
  • lch(2% 0 0) and lch(4% 0 0)
  • lch(4% 0 0) and lch(6% 0 0)
  • ...
  • lch(98% 0 0) and lch(100% 0 0)

I then calculated the WCAG contrast according to the official algorithm and plotted that (darker colors are left, brighter ones are on the right):

As you can see we get some non-linear relationship, which makes sense when looking at the formulas. But I would have expected a constant or at least linear relationship here.


So then I thought: Maybe the WCAG is wrong and did a woopsie? So I again created color pairs in the Lch color space (each 5% lightness apart this time) and created divs with background color and colored text with those. The number in the div is the WCAG contrast.

(Of course, the image uses 8bit sRGB, so there are minor rounding errors)

Interestingly, at least to my eye and on most of my screens, the WCAG contrast is accurate. Specifically, the dark ones at the top are harder to read, with the ones in row 2 and 3 being easiest to read. So is the "perceived lightness" formula linked above wrong?

At least in my brain, it can't be that the "perceived brightness" and the WCAG contrast both accurately represent the human perception when it comes to lightness and contrast. These two things seem completely linked to me.

Can anyone explain what's going on?


Solution

  • Short Answer

    The answer by @Andy May 3rd is good (and thanks for linking to one of my articles).

    The present post is to expand on a couple things that may be of interest.

    Longer Answer

    "...I don't understand: Why is the WCAG using the luminance instead of the perceived lightness?..."

    Why Y

    The "why" of this has much to do with the politics and process of a standards organization, and in this case trying to create a general standard for accessibility in a technology space (web) that was at that time (circa 2005-2008) somewhat devoid of accessibility considerations. They created a lot of things, unfortunately contrast ended up as the Achilles' heel.

    Grilling Weber

    The simple ratio equation is functionally an "inverted Weber" with a 0.05Y added in a furtive attempt to clamp the ratio to something reasonable, with the explanation that it was to model screen flare. The 4.5:1 threshold lacks scientific support, as has been discussed (this linked thread covers the origins in more detail).

    Weber fraction dates back to the 1800s, and it's been used for many different types of perception. Weber defines the just noticeable difference threshold (JND). Keeping in mind that there was no such thing as a self illuminated display for an electronic computer back in the 1800s because neither had been invented yet, and "advanced technology" back then was the steam engine, we have our first clues as to why Weber might not be the ideal solution for predicting contrast.

    For text on a display, we are actually not interested in the threshold JND, we are interested in suprathreshold, well above the threshold, as that's where we need contrast to be for best fluent readability.

    We began the research for a replacement method in 2019, and found that even then there was not a method that was particularly good at predicting contrast of text on self illuminated displays in a way that matched empirical data. This led to the creation of several new methods.

    Lstar Wars

    One of the first alternatives we tested was ∆L* (calculated from CIELAB, i.e. the LCH you were using). Finding the difference between two Lstars was one of the popular methods of predicting contrast—in our tests though we found it was not significantly better than WCAG2's contrast math.

    L* was created in 1976 by CIE, for LAB and LUV, and is based on Munsell value. Munsell value is derived from empirical studies involving low spatial frequency diffuse reflecting color patches in a defined illumination environment. Once again, not a self illuminated display.

    So even if WCAG 2 had used ∆L*, the reality is that the end results would not be appreciably improved, with dark colors being unreadable. It is partly a case of some aspects of technology growing faster than others. There have been a number of advances in our understanding of contrast over the last decade and a half, and particularly over the last 4½ years.

    𝛥𝛷✵

    Delta Phi Star or DPS Contrast takes the standard L* (D65) and adds a little extra math to help coax it into a better predictor of perceptual contrast. Where L* is 0-100:

    let dps = (Math.abs(bgLstar ** 1.618 - txLstar ** 1.618) ** 0.618) * 1.414 - 40 ;
    

    DPS is decent in the midrange, but it doesn't take into consideration things like polarity, i.e. light mode versus dark mode.

    APCA

    The Accessible Perceptual Contrast Algorithm uses multiple power curves to shape a resultant perceptual contrast that is reasonably uniform across the visual range, as well as incorporating polarity sensitivity, and considering spatial characteristics. We have a brief overview: "Why APCA"

    "...I then calculated the WCAG contrast according to the official algorithm and plotted that..."

    That's an interesting plot, and does show the slight difference between WCAG2 and ∆L*, but as I mentioned, even ∆L* is not particularly accurate in predicting contrast of text at the higher levels needed.

    Color difference at threshold JND and perceived contrast of high spatial frequency stimuli at suprathreshold levels have significantly different characteristics across the visual range.

    "...So then I thought: Maybe the WCAG is wrong and did a woopsie?..."

    Yes, WCAG2 contrast is wrong — However, it is also useful to note that circa 2005, here in the film industry where I spent most of my professional career, we were going through the transition from chemical imaging to digital imaging. The film/TV industry ran into similar "understanding issues" during this transition, as visual perception is a complex, abstract, and nuanced subject matter.

    If there was a "woopsie" in my opinion, it was the lack of considering the body of research of Lovi-Kitchin et alia, and the contrast models of Barten. There was in fact a lot of good science regarding readability and contrast that existed at the time that was not referenced, instead what was referenced were some essentially obsolete (circa 1988) standards for CRT matrix-type monochrome displays. But at that time it was a voluntary guideline and they had a lot of other material that needed attention.

    "...at least to my eye and on most of my screens, the WCAG contrast is accurate. Specifically, the dark ones at the top are harder to read, with the ones in row 2 and 3 being easiest to read..."

    Not accurate actually, on a hardware calibrated monitor the top row (1) is unreadable, the next row is not much better. Middle row to the bottom row are about the same. If WCAG2 was accurate the top row would read about 1.02 to 1.03 (estimate) but WCAG2 wrongly inflates reported contrast with dark colors.

    And here's one of the fun things about the human vision system: looking at the same stimulus, the perception of it can change over time, and also change based on the surrounding context, and a number of other factors.

    KEYS:

    • You can't judge magnitude with just a glance — look at each patch for at least five seconds so that you have some amount of adaptation to the patch.
    • Spatial is helpful here, as our contrast perception is more tied to the spatial characteristics, in other words line thickness or font weight, than the color.
      • Zoom out to make the image smaller (or farther away) until you cannot see any of the text, then slowly zoom it larger until you can just barely see some text, and which row do you see first?
      • For me on this calibrated display @120nits it was the bottom row.
    • Not just calibration, but display brightness. Increasing the brightness on my display and doing the same zoom, I find the second to the bottom row is perhaps slightly better (my older eyes are susceptible to glare).

    In other words, the absolute distance between two colors is not the only determiner of contrast. Other factors are:

    • the spatial characteristics of the stimuli (line thickness)
    • the eyes relative adaptation to the environment and the overall screen,
    • the context of what a particular color pair is sitting next to,
    • the total brightness of the display,
    • the gamma of the display,
    • the age of the eye, etc etc.

    "...So is the "perceived lightness" formula linked above wrong?...it can't be that the "perceived brightness" and the WCAG contrast both accurately represent the human perception..."

    These tests that you've shown are very close to that just noticeable difference threshold. At such a low level, so close to threshold, you're not going to find "substantial" differences between many of the common contrast maths.

    Looking at the first column, and keeping in mind that 8 bit rounding means ±0.5%, assuming a typically bright environment for adaptation, and the display in light mode so the sample image is surrounded by very light or white, then a more perceptual method could return:

    row contrast
    1 <1%
    2 <2%
    3 3.5%
    4 4.5%
    5 5%

    You might find it interesting if you set up those patches with different fonts. For instance, try Montserrat light (300 weight) and compare to something very bold like Arial Black or Helvetica 900.

    Contrast Condition Characteristics

    A point: Don't expect simple math to accurately describe the characteristics of the HVS. There are too many conditions that affect perception.

    There are some things we can assume such as a typical office environment lighting of 350 to 500 lux, an sRGB-type display at 120nits, and an overall surround luminance on the display at approx. #e1e1e1. This is a common set of conditions that is also "difficult for contrast". For instance if the entire surrounding screen is set to black, perceived contrast will generally increase. So, the brighter lighting environment will impact contrast, and particularly make darker colors more difficult to read.

    Notice: any opinions expressed are my own and do not necessarily reflect those of the W3C or AGWG.