Search code examples
x86cpu-architecturepagingtlbpage-tables

How prompt is x86 at setting the page dirty bit?


From a software point of view, what is the latency between an instruction that dirties a memory page and when the core actually marks the page dirty in the Page Table Entry (PTE)?

In other words, if an instruction dirties a page, can the very next instruction read the PTE and see the dirty bit set?

I don't care about the actual elapsed cycles, only if there is a software visible window in which the dirty bit is not yet set. I can't seem to find any guarantees in the reference manuals.


Solution

  • From the AMD's manual (circa 2005), Volume 2: System Programming:

    5.4 Page-Translation-Table Entry Fields ... Dirty (D) Bit. Bit 6. This bit is only present in the lowest level of the page-translation hierarchy. It indicates whether the pagetranslation table or physical page to which this entry points has been written. The D bit is set to 1 by the processor the first time there is a write to the physical page.

    Ditto from Intel (circa 2006), Volume 3-A: System Programming Guide, Part 1:

    3.7.6 Page-Directory and Page-Table Entries ... Dirty (D) flag, bit 6 Indicates whether a page has been written to when set. (This flag is not used in page-directory entries that point to page tables.) Memory management software typically clears this flag when a page is initially loaded into physical memory. The processor then sets this flag the first time a page is accessed for a write operation.

    UPDATE:

    From the latest Intel manual (vol 3A, System Programming Guide):

    8.1.2.1 Automatic Locking The operations on which the processor automatically follows the LOCK semantics are as follows: ... When updating page-directory and page-table entries — When updating page-directory and page-table entries, the processor uses locked cycles to set the accessed and dirty flag in the page-directory and page-table entries.

    From the rest of the text in sections 8.1 and 8.2 it follows that once the CPU sets the dirty bit using the locked operation, the other CPUs should start seeing the updated value.

    Of course, you may have a race condition in that you first read the dirty bit as 0 on one CPU (or in one of its threads) and later another CPU (or another thread on the same CPU) causes this bit to be set to 1, but that isn't any unusual.