Difference between dFdxFine and dFdxCoarse

From the OpenGL documentation:

dFdxFine and dFdyFine calculate derivatives using local differencing based on on the value of p for the current fragment and its immediate neighbor(s).

dFdxCoarse and dFdyCoarse calculate derivatives using local differencing based on the value of p for the current fragment's neighbors, and will possibly, but not necessarily, include the value for the current fragment. That is, over a given area, the implementation can compute derivatives in fewer unique locations than would be allowed for the corresponding dFdxFine and dFdyFine functions.

Which is the difference between them? When should I care?

I understand that both calculates the derivative of the value respect the window coordinates, but I don't understand the method used to computed them.

I guess that they are both implemented in hardware, but could you post a dFdx pseudo-code implementation?

Solution

From the GLSL spec:

It is typical to consider a 2x2 square of fragments or samples, and compute independent dFdxFine per row and independent dFdyFine per column, while computing only a single dFdxCoarse and a single dFdyCoarse for the entire 2x2 square.

Basically the way the derivatives are computed is by numeric differentiation. For the sake of simplicity assume we are rendering into a single-sampled framebuffer, and assume we want to compute dFdx(a). Then typically a 2x2 square of neighboring fragments will be shaded simultaneously (i.e. within the same workgroup):

    a00  a10
    a01  a11

Conceptually all shader invocations will compute their value a, write it to the shared memory, and issue a barrier. Then after the barrier the derivatives can be approximated by:

dFdxFine(a) = (a10 - a00)/dx       at xy = 00, 10
dFdxFine(a) = (a11 - a01)/dx       at xy = 01, 11

For the coarse derivatives the specification explicitly permits to compute only one derivative for the whole 2x2 block of pixels. So a conforming implementation could just as well compute:

dFdxCoarse(a) = (a10 - a00)/dx     at xy = 00, 10, 01, 11

Whether there is a difference in performance between the two depends on the hardware. If they do return different results on your hardware though, then the 'coarse' version should be faster. Usually you should not care about these functions however. Simply use the dFdx and dFdy variants, which use the implementation default variant (either fine or coarse).