Search code examples
windowswindows-kernel

Why does Windows defer creation of PPTEs for data files?


The FSD calls CcInitializeCacheMap when reading/writing to the file for the first time which will result in the creation of the private cache map, section object, control area, segment, subsections if they don't already exist. As the cache manager creates a section object it will specify it is a data section SEC_DATA in the sectionattributes parameter of NtCreateSection meaning the PPTEs aren't initially configured and the base in the segment is left blank. The actual read is done using CcCopyRead which will first allocate VACBs and map a view of the file and then copy from the VACBs to the buffer. Every time it allocates a VACB it maps a 265kb view into the cache manager virtual space. When it makes this mapping it needs to initialise a PPTE but when it initialises one PPTE it might as well initialise all of them for the whole file because they need to be contiguous by design; that memory is going to be reserved whether they're allocated or not.

Windows internals claims that it defers creation of PPTEs for data files until the first view is mapped but with image files it creates them when the section object is created.

For page-file-backed sections, an array of prototype PTEs is created when a section object is first created. For mapped files, portions of the array are created on demand as each view is mapped.

And another source states:

While mapping a data file, the main purpose of MiCreateDataFileMap is to setup the subsection object. In the normal case only one subsection is created, but under some special conditions multiple subsections are used, e.g. if the file is very large. For data files, the subsection field SubsectionBase is left blank. This defers the creation of PPTE until the the section is mapped into the memory and finally accessed for the first time. The reasoning behind this is to avoid wasting memory when very large data files are mapped. Instead the SegmentPteTemplate field of the segment object is setup properly which can be used to create the PPTEs subsequently if necessary.

I'm just going to disagree with this because if a 4GB data file were mapped it would only need 2MB of PPTE pages which is not lots of space so I don't see the benefit of this. The real space saving comes from the VACBs and the fact that the whole file doesn't have to reside in physical memory.

It still doesn't make sense why they claim 'portions' of PPTEs are initialised on demand as if it's implying it's saving space because it just doesn't because as soon as one PPTE is mapped the whole area is reserved and SubsectionBase is set in the segment as the PPTEs for the whole file need to be contiguous.

'portions' could be intialised on demand but it doesn't change the fact that the space is reserved regardless of whether there are PPTEs there or not. For instance when a VACB is allocated it could initialise all PPTEs covering the 256kb. When a page fault occurs, the PTEs point to the PPTEs and the PPTEs will be invalid so it could perform 256kb clustered IO on 256kb granularity meaning any other page faults will be soft page fault. (I believe when PPTEs are first allocated, at the time PTEs are allocated for the VACB view, the PTEs are made to point to the PPTEs. Windows internals comes up with some bollocks about the faulting virtual address being used to search the VAD of the process for PPTE start and end on page 411 but system cache is part of kernel memory which is not kept track of by VADs so it's wrong. This method is only used for user space mapping). An issue with this is a page might be modified so it can't necessarily do the 256kb cluster optimisation (1 hard page fault, 63 soft) and lock the pages into memory to perform the IO. It would have to know all the PPTEs are newly allocated and the fault is not just because a single frame was not present. The best thing to do would be to perform the IO when views in the range are mapped and PPTE is allocated so that no page fault will occur when the read takes place. Otherwise it would have to service 64 hard page faults.

So what's the point? It might as well initialise the PPTE array as soon as the data file section is created because it's only going to be immediately read in the scenario above. I can't think of a scenario where a process would map a load of files into its address space and then not touch them. Even if it mapped 40GB it would still only take up 20MB in PPTEs of they were initialised when the data section is created.


Solution

  • I suppose the benefit comes from the fact that the virtual range reserved that will contain the PPTEs is not actually mapped to physical memory, despite the fact that the whole range of PPTEs in virtual space will be taken up for each process. When a VACB is mapped, the portion of PPTEs for the range will be filled in which now has a cost in physical memory. It's still only 20MB for a 40GB file however. 20MB would be reserved in virtual memory but 0 in physical. The amount of phsycial space occupied would increase by 64*8 bytes per 256KB view.