Search code examples
linuxlinux-device-driverembedded-linuxmmap

Map physical memory to userspace as normal, struct page backed mapping


I have a custom device driver that implements an mmap operation to map a shared RAM buffer (outside of the OS) to userspace. The buffer is reserved by passing mem=32M as a boot argument for the OS, leaving the rest of the 512MB available as a buffer. I would like to perform zero-copy operations from the mapped memory, which is not possible if the vm_flags include VM_PFNMAP and VM_IO.

My driver currently performs the mapping by calling vm_iomap_memory(vma, start, size), which in turn calls io_remap_pfn_range and remap_pfn_range, which sets up the vma with the VM_PFNMAP and VM_IO set. This works to map the memory to userspace, but zero-copy socket operations fail at get_user_pages due either to the VM_PFNMAP flags being set or the struct page being missing. The comments for remap_pfn_range show this is intended behavior, as pfn-mapped memory should not be treated as 'normal'. However, for my case it is just a block of reserved RAM, so I don't see why it should not be treated as normal. I have set up cache invalidation/flushing to manually manage the memory.

I have tried unsetting the VM_PFNMAP and VM_IO flags on the vm_area_struct both during and after the mapping, but get_user_pages still fails. I have also looked at the dma libraries but it looks like they rely on a call to remap_pfn_range behind the scenes.

My question is how do I map physical memory as a normal, non-pfn, struct page-backed userspace address? Or is there some other way I should be looking at it? Thanks!


Solution

  • I've found the solution to mapping a memory buffer outside the Kernel that requires a correction to several wrong starting points that I mentioned above. It's not possible to post full source code here, but the steps to get it working are:

    1. Device tree: Define reserved memory region for buffer with no associated driver. Do not use mem or memmap bootargs. Kernel will confine itself to using memory outside of this reserved space for itself, but now will be able to make struct pages for reserved memory.
    2. In a device driver (a LKM in my case), mapping physical address to kernel virtual address requires using using memremap instead of ioremap, as it is real memory we are mapping.
    3. In device driver mmap routine, do not use any variant of remap_pfn_range to setup the vma for usespace, instead assign a custom fault nopage routine to the vma->vm_ops.fault to look up the page when the userspace virtual address is used. This approach is described in lddv3 ch15.
    4. The nopage function in the driver should use the vm_fault structure argument that is passed to it to calculate the offset into the vma for the address that needs a page. Then use that offset to calculate an kernel virtual address (against the memremap'd address), and get the page with a call to page = virt_to_page(pageptr);, followed by a call to get_page(page);, and assign it to the vm_fault structure with vmf->page = page; The latter part of this is illustrated in lddv3 chapter 15 as well.

    The memory mapped in this fashion using mmap against the custom device driver can be used just like normal malloc'd memory as far as I can tell. There are probably ways to achieve a similar result with the DMA libraries, but I had constraints preventing that route, or associating the device tree node with the driver.