How are mipmapped textures sampled?

My question is specifically in regards to Metal, since I don't know if the answer would change for another API.

What I believe I understand so far is this:

A mipmapped texture has precomputed "levels of detail", where lower levels of detail are created by downsampling the original texture in some meaningful way.
Mipmap levels are referred to in descending level of detail, where level 0 is the original texture, and higher-levels are power-of-two reductions of it.
Most GPUs implement trilinear filtering, which picks two neighboring mipmap levels for each sample, samples from each level using bilinear filtering, and then linearly blends those samples.

What I don't quite understand is how these mipmap levels are selected. In the documentation for the Metal standard library, I see that samples can be taken, with or without specifying an instance of a lod_options type. I would assume that this argument changes how the mipmap levels are selected, and there are apparently three kinds of lod_options for 2D textures:

bias(float value)
level(float lod)
gradient2d(float2 dPdx, float2 dPdy)

Unfortunately, the documentation doesn't bother explaining what any of these options do. I can guess that bias() biases some automatically chosen level of detail, but then what does the bias value mean? What scale does it operate on? Similarly, how is the lod of level() translated into discrete mipmap levels? And, operating under the assumption that gradient2d() uses the gradient of the texture coordinate, how does it use that gradient to select the mipmap level?

More importantly, if I omit the lod_options, how are the mipmap levels selected then? Does this differ depending on the type of function being executed?

And, if the default no-lod-options-specified operation of the sample() function is to do something like gradient2D() (at least in a fragment shader), does it utilize simple screen-space derivatives, or does it work directly with rasterizer and interpolated texture coordinates to calculate a precise gradient?

And finally, how consistent is any of this behavior from device to device? An old article (old as in DirectX 9) I read referred to complex device-specific mipmap selection, but I don't know if mipmap selection is better-defined on newer architectures.

Solution

This is a relatively big subject that you might be better off asking on https://computergraphics.stackexchange.com/ but, very briefly, Lance Williams' paper "Pyramidal Parametrics" that introduced trilinear filtering and the term "MIP mapping", has a suggestion that came from Paul Heckbert (see page three, 1st column) that I think may still be used, to an extent, in some systems.

In effect the approaches to computing the MIP map levels are usually based on the assumption that your screen pixel is a small circle and this circle can be mapped back onto the texture to get, approximately, an ellipse. You estimate the length of the longer axis expressed in texels of the highest resolution map. This then tells you which MIP maps you should sample. For example, if the length was 6, i.e. between 2^2 and 2^3, you'd want to blend between MIP map levels 2 and 3.