Search code examples
x86x86-64privilegesuefimsr

x86 CPU Modes / Rings During Boot Process


I currently try to understand how exactly the platform initialization works on modern UEFI x86 systems. However, it is unclear how the privilege modes are defined and where and how they change.

I recently saw a document from Intel that differentiated between BIOS, SMM, and OS for MCHBAR access. From that, I follow there must be a mode more privileged than SMM. Even if the boot starts in real mode this cannot be the case for the UEFI due to the 1MiB memory limit, right?

Also, does the "UEFI-Code" executes in the same privilege? Since this code can be further differentiated into "PEI" and "DXE"?

There are some resources, such as https://secret.club/2020/05/26/introduction-to-uefi-part-1.html, but they do not introduce the different privileges.

I guess there is an MSR that locks the access to certain configuration space regions after a microcode load (like the PRM lock in SGX https://github.com/coreboot/coreboot/blob/master/src/soc/intel/common/block/sgx/sgx.c), but I did not find any resources to validate my hypothesis yet.

[EDIT]

Another excellent resource for the post-UEFI process and Linux is described here

I also came across this course which partially covers low level UEFI stuff here


Solution

  • I'm not sure what document you are referring to.

    I'm going by memory while writing this answer, I hope I didn't miss too much or made too many mistakes.

    I know of four main mechanisms to enforce security at the hardware level:

    1. The ucode has access to the internal registers of the uncore components (like the iMC) and the PCH (over the DMI/OPI link). So it can "open" protected features. For example, the ucode routine that transition to SMM will open the SMM RAM (it will write some bits in a specific register over the ring/CRBUS that makes the iMC reclaims accesses to this protected RAM). Similarly, at reset, the ucode parsing the FIT will tell the PCH to open the TPM locality 4.

    2. PCIe source id. Each PCIe transition is marked (I don't remember if in or out-of-band) with the Bus/Dev/Fun of the originator. The recipient device can make a policy decision based on this. For example, a Flash Descriptor is used to tell the PCH which component (CSME, CPU, Other devices) can access which regions. As a more specific example, the CPU can read the BIOS region and the Gigabit firmware region but not the CSME region. So if you try to dump the flash memory, reads to that region will fail.

    3. Some configuration registers have lock bits that get unlocked only by a reset. In this scenario, the first code to execute has more privilege over the later one.

    4. In a MLE (Measured Launch Environment), the code at stage i will measure (see the TPM PCRs) and verify the code at stage i+1 before executing it. So this create a chain of trust rooted in the CPU ucode boot routing (the ACMs are measured and verified).

    The CPU can easily tell if it's in SMM mode, so it can open broader access to the MCHBAR (though I always thought the OS had complete control over the MCHBAR).

    As for executing "BIOS", there is technically no difference from the OS.
    However this is true only for the legacy boot path. Since Haswell the CPU boots with an opt-out version of Intel TXT, which uses ACMs.
    The execution of an ACMs from the FIT (i.e. during the boot) is easily distinguished by the CPU and grant tthem access to protected features (like TPM locality 4).

    I don't think the BIOS relies on any secret instruction or configuration register/MSR since that will be Security Through Obscurity (STO).
    Instead I believe the BIOS is just as privileged as the OS (unless when in SMM mode) it only has more knowledge of the internal, yet not critical for security in theory, details (e.g. how to configure the iMC SADs and TADs).

    One thing to bring into the picture is the MLE, in a MLE the PEI modules are verified. Not only their code, but the value of the CPU and PCH configuration registers (you can measure any value, order is important) is measured too.
    This guarantee a remote verified that the system hardware configuration was not tempered and form a chain of trust where the firsts have control over the lasts.

    So I would say:

    1. After reset the CPU is executing an Intel TXT ucode routine and, as all ucode routine, has full access to its internal components and all comply external ones. This is the maximum privilege possibile.
    2. The ACM executed before the legacy boot have normal SMX privileges, which are like Protected Mode Ring0 but with access to more hardware configuration registers (and possible less access to ISA extensions) thanks to the boot ucode routing (this can include special access to the MCHBAR). This is the SEC phase I guess.
    3. The configuration of the platform can either be done with an ACM, with legacy code or with both. ACMs have been discussed above. During the legacy boot the BIOS will configure the CPU (including the uncore, like the iMC), the PCH and any peripheral with a privilege that should be exactly the same of an OS, unless the BIOS uses some STO. Of course, the BIOS can lock lockable registers (i.e. VMX support). The BIOS will probably use the Intel Firmware Support Package to configure the PCH and most of the CPU. This is the PEI phase I guess. This phase can be measured if the platform support it and it's configured to.
    4. From now on (DXE phase, OS booting) I think the code has the same privileges of an OS, even if PEI is done with an ACM, this phase should not. In a MLE you still have a chain of trust and locked registers stay locked but as far as the CPU goes, it cannot really tell when the firmware ends and the OS starts.
    5. In the event of SMI, the CPU enters the SMM and that would give it slightly more privilege than the OS.
    6. Other executing agents, like the CSME, the EC, can have more or less privilege due to their position (i.e. the Gigabit controller can surely read the Gigabit flash region) or their wiring (e.g. the EC could control some PCH straps).

    The most used way to prevent further modification of a register is by using a lock bit.
    The likely used way to alter the behavior of a component (e.g. the iMC) is through a ucode routine executing when the CPU receive special events (like an SMI or a reset).