I am writing to inquire the feasibility of tracing the page table access (in terms of "index" of each page table access) of a common Linux user application. Basically, what I am doing is to re-produce the exploitation mentioned in this research article (https://www.ieee-security.org/TC/SP2015/papers-archived/6949a640.pdf). In particular, the data-page accesses need to be recorded for usage and inference of program secrets.
I understand the on Linux system, 64-bit x86 architecture, the page table size is 4K. And i have used pin
(https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool) to log a trace of addresses for all virtual memory access. So can I simply calculate the "index" of each data page table access, with the following translation rule?
index = address >> 15
Since 4KB = 2 ^ 15
. Is it correct? Thank you in advance for any suggestions or comments.
Also, I think one thing I want to point out is that conceptually, I don't need a "precise" identifier of each data page table ID, but just a number ("index") to distinguish the access of different data pages. This shall provide conceptually identical amount of information compared with their attacks.
Ok, so you don't really need an "index", but just some unique identifier to distinguish different pages in the virtual address of a process.
In such case, then you can just do address >> PAGE_SHIFT
. In x86 with 4KB pages PAGE_SHIFT
is 12
, so you can do:
page_id = address >> 12
Then if address1
and address2
correspond to the same page the page_id
will be the same for both addresses.
Alternatively, to achieve the same result, you could do address & PAGE_MASK
, where PAGE_MASK
is just 0xfffffffffffff000
(that is ~((1UL << PAGE_SHIFT) - 1)
).