How Does BIOS initialize DRAM?

I've been searching all over for an explanation of how exactly BIOS works now for quite some time. I have designed a bootloader and have jumped to 32-bit mode with it while successfully initializing the IDT as well as the GDT, but in doing so, I have found "operating systems" seeming to be quite simple, and the feeling as if "BIOS" IS the actual operating system of every computer.

So now I have took on a new challenge of trying to discover how BIOS actually initializes itself, discovers how much RAM is usable, and how/where add-in card ROM gets imported into RAM. To my understanding, the processor, not via a jump, but automatically starts executing code inside RAM at 16-bit segment:offset address 0xFFFF:0x0000. Meaning that all computers technically must have at least 1MB worth of RAM initially in order for them to be able to boot, due to the processors starting location, and due to that knowledge I have been assuming that all BIOS's automatically write themselves into RAM before the processor gets its RST signal. Which I feel is not true, as that is exactly what "Shadow BIOS" is which can be disabled through BIOS I believe. I have been searching everywhere for a "BIOS Designer's Guide" however, I keep coming out empty-handed with every specification I seem to read.

As a programmer, I understand that there are probably numerous ways to actually accomplish what I am actually asking and that there is probably no way in hell to give a decent straight-forward answer, and if I must be more specific, say I was working with a Dell Inspiron 518, or at least a computer containing a G33 chipset (G33 north bridge and ICH9 south bridge) and I wanted to program the initial Pre-POST program, and build my own 16-bit IDT with all the standard interrupts and everything needed that could potentially boot another operating system such as Windows 10 successfully. How does BIOS actually know how much RAM there is? Does it just do a bit-write bit-read test at the highest memory areas and go down from there? And how do add-in cards ROM even get loaded into RAM? To my understanding BIOS builds a very basic list of interrupts and/or "entry points" that add-in card ROMs can utilize and give them the ability to "latch" onto other BIOS's interrupts such as "$PMM"? And how do BIOS manufacturers know what exact anchor strings are needed within their BIOS to be able to boot an operating system like Windows?

Any answers would be very helpful, as well as any recommended specifications and/or any guides able to lead me into the knowledge I've been seeking. Such as maybe a guide to say "the minimum required processes needed to be accomplished by BIOS before handing off to an IPL?" or even an example of source code in C or Assembly with something that can show me what an add-in card's ROM image actually is or looks like would be very helpful.

Solution

I'm restricting this answer to Intel architectures since I'm mostly familiar with them.

The document you (and also I) are looking for is called the BIOS Writer Guide and, unfortunately, is confidential and has not leaked so far (AFAIK).

In order to promote their product in the Open Source community, Intel released the Firmware Support Package. This is to be considered akin to a library for the firmware writers and contains (binary) code to initialize the memory controller, the PCH (Peripheral Controller Hub, informally known as "the chipset"), and the CPU¹.
An open source developer, or in general any developer that cannot afford to sign an NDA with Intel, can use the FSP to writes their own firmware.

One could reverse the FSP (one of the many TODOs of mine) but it's quicker to use it as a reference.

When the power is switched on a lot of things happen before the CPU starts executing from the reset vector² but the important thing to remember is that the chipset (i.e. the PCH) already allows the CPU to access the flash ROM.
In fact, that's how the first instructions are executed since the CPU can only fetch instructions from the memory address space.

So as long as the firmware keeps the execution flow within the region of memory mapped to the flash ROM (this region is determined by the Flash Description present in the flash ROM itself, the PCH reads it during its reset and configure the routing of memory requests accordingly), its code can be executed.

Since memory is not yet initialized and the flash ROM is read-only (w.r.t. memory write cycles) these feature cannot be used:

calls. Since they need a writable stack.
variables in memory. Since they, well, vary.

Both are annoying points, in assembly you can use jumps and registers to work around them, but in C you cannot.
So the first thing done by the firmware usually is setting up a "temporary RAM".
This is the TempRamInit() routine of the FSP (which, by the way, must be called with a jump) and in practice, it sets up Cache-as-RAM (CAR).

Cache-as-RAM

The idea is to use the cache as a temporary RAM.
The fundamental point is that cache lines doesn't expire, they are evicted only when there's no more space for a new requested line coming from memory.
So as long as you are careful enough to avoid accessing more variables that can fit in the cache, the CPU will only read and write from the cache (of course, this requires the Write-back caching mode).

However, this would require careful positioning of variables and it's indeed very fragile.
A better approach is to enable the cache (by clearing the CD (Cache Disable) bit in the CR0 register) and then do dummy reads (or even, writes) from a memory region as large as the L1³.
Then you disable the caches again, this mode is actually known as no-fill mode, where no new lines are brought into the cache (so no existing line can be "lost") but read and writes can still hit in the cache.

This allows a few KiB of "RAM".
There exist C compilers for CAR environments.

Initializing the RAM

Now the firmware can initialize the RAM, in order to do so three things must be done:

Tell the memory controller about the DIMM timings (CAS, RAS at all).
Tell the memory controller about the DIMMs size and ranking.
Set the routing.

The memory controller is configured through PCI Configuration space and MMIO, you can find the specifics in your processor datasheet volume 2 (Assuming the MC is in the CPU die).
For example, 8th and 9th generation core datasheet vol 2 contains the description of the memory controller registers. Here's an extract where the firmware can set the tRAS parameter:

Analogously, you'll find the registers for the DIMMs size and type, channel size and so on:

These registers cover points 1 and 2 (and a bit of point 3, depending on the definition) but how can the firmware know what values to use?
After all, the DIMMs are replaceable.

As already noted, the solution is Serial Presence Detect (SPD), a small EEPROM integrated on the DIMMs themself that describes the memory timings, topology and size.

The EEPROM is accessed with an I2C compatible bus.
In the Intel architecture, the bus actually used is the SMBus (System Management Bus) which is compatible with I2C and was created appositely.
The SMBus master is found in the PCH and documented in the datasheet volume 2 of the relevant series.
For example the PCH series 200 datasheet vol 2.

The SMBus master must be configured before being used but it's very simple. Once configured it can be used to read the SPD data.
This works exactly like accessing any other I2C device.
The SPD EEPROMs (there can be more than one, of course, one per DIMM) are reserved the addresses from 0x50 to 0x57 (on the series 200 PCH).
It's possible to write to the SPD and a bit to disable such behavior exists in the SMBus master:

Once the SPD data are read, the MC can be configured, and then the RAM can be used.

This is the FspMemoryInit() routine of the FSP.

The last step is configuring the routing.
This includes setting up the end of the RAM region in the memory address space (refer to the PCH datasheet for a complete picture) and, in a NUMA system, the Source Address and Target Address decoder to route memory requests across sockets through QPI/UPI links.
All of this is done through the PCI configuration space of the integrated devices in the PCH.

In NUMA systems is necessary to boot the other Application Processors (one per socket) to configure their memory controllers too.
This is done with Inter Processor Interrupts (IPIs) issued through the LAPIC, which is an MMIO component in each CPU.

Summary

The rough steps performed by the firmware are:

Perform any basic environment initialization (e.g. switch to 32-bit mode).
Initialize Cache-As-RAM.
Initialize the SMBus master in the PCH using the PCI enumeration.
Read the SPD EEPROM of each DIMM.
Configure the Memory Controller of each socket with the SPD data.
Configure the PCH memory map.
Configure the NUMA routing.

¹ The CPU doesn't need initialization, in fact, a lot of code has already been executed by the time the FSP initialization routine is called. They probably meant "fine-tuning" of some, more or less documented, feature.

² They won't be discussed here but, briefly, the Embedded Controller (for laptops, hardwired logic for desktops) will be turned on, once booted (using its integrated ROM) its firmware will use GPIOs to switch on the necessary power gates of the board. One this gates powers the PCH which, once the EC firmware asserts the right pin, will boot its own firmware (which is known as the Management Engine Firmware because it's bundled with the rest of the ME code, inside the ME region of the same flash ROM which also contains the BIOS code, but technically it's the Bring-Up, BUP, module) and reset the chipset. Once the chipset is ready, it will assert the power good pin of the CPU and then the reset/init pin(s) which will cause the CPU to start executing the POST and then, assuming a TXT capable CPU, the microcode to fetch the Firmware Interface Table from the flash ROM and from it the SINIT ACM (System Init Authenticated Control Module, which will set up the security necessary for a measured launch) and optionally the BIOS ACM (which will perform vendor-specific tasks, possibly including booting, skipping the legacy reset vector). Eventually, the BIOS ACM (or the microcode if no BIOS ACM was found in the FIT) will jump to the reset vector. This is the legacy boot flow. Note that the ACMs are executed in a specially crafted environment that employs Cache-as-RAM (see above), following the semantic of any other TXT launch (refer to the Intel TXT specifications).

³ According to Intel, when CD is set, no line replacement is done. I assume that would not moves lines back and forth higher caches either.