I am working on a project where we have a kernel module that allocates memory using kmalloc()
, and maps it to userspace using remap_pfn_range()
. In remap_pfn_range()
the vm_area_structs has its vm_flags field set to VM_IO | VM_PFNMAP
(as well as some other flags that seem irrelevant to my problem). When I call sendmsg()
using the address returned by our mmap()
implementation (which is where the remap_pfn_range()
call happens) the sendmsg()
call fails with errno EFAULT
.
I have tracked down the location in the kernel where -EFAULT is first returned, it happens in check_vma_flags()
in mm/gup.c
, the problem is that the above mentioned flags are set in the vm_area_struct
:
static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
{
vm_flags_t vm_flags = vma->vm_flags;
int write = (gup_flags & FOLL_WRITE);
int foreign = (gup_flags & FOLL_REMOTE);
if (vm_flags & (VM_IO | VM_PFNMAP))
return -EFAULT; // <= This error propagates all the way back to userspace.
For background, the allocated memory is shared between the CPU and an FPGA in a Cyclone V SOC-FPGA. The FPGA writes the bulk of the data to the buffers and requires the memory to be physically contiguous. We are currently using v5.3-rc8 of the linux kernel.
The memory mapping works well for regular send()
calls without any attempts at zero-copy, but now we need to improve the network transmission speed for our data. In investigating the zero-copy mode for sendmsg()
we have observed up to 50% improvements in data transfer rate, when we are using regular userspace allocated memory through malloc()
.
I have tried unsetting the vm_area_struct.vm_flags
bits that correspond to VM_IO
and VM_PFNMAP
after the call to remap_pfn_range()
. This results in sendmsg()
actually "succeeding", the syscall returns the size of the buffer that I wished to send. However, no data arrives on the client side. Also, when I reboot the kernel, there are a lot of errors printed on the serial console.
I don't expect what I did to be the correct way to do this. I have to assume that the vm_flags
are set for a very good reason, and there are no conditions in remap_pfn_range()
that would prevent the flags from being set.
Hence, my question in the title: Is there a way to map memory obtained form kmalloc()
to userspace such that it can be passed to sendmsg()
with MSG_ZEROCOPY
?
Update: I see that there have been a few posts regarding this subject already:
Zero-copy user-space TCP send of dma_mmap_coherent() mapped memory
Map physical memory to userspace as normal, struct page backed mapping
I will try the suggested answers from those posts and report back here.
I have managed to implement a solution similar to what is described in steps 3 and 4 of Map physical memory to userspace as normal, struct page backed mapping.
My fault handler looks like this
vm_fault_t vm_fault(struct vm_fault* vmf)
{
unsigned long pos = vmf->vma->vm_pgoff / NPAGES;
unsigned long offset = vmf->address - vmf->vma->vm_start;
// Have to make the address-to-page translation manually, for unknown
// reasons virt_to_page causes bus error when userspace writes to the
// memory.
unsigned long physaddr = __pa(memory_list[pos].kmalloc_area) + offset;
unsigned long pfn = physaddr >> PAGE_SHIFT;
struct page* page = pfn_to_page(pfn);
// Increment refcount to page.
get_page(page);
return vmf_insert_page(vmf->vma, vmf->address, page);
}
and my mmap()
implementation
static int mmap_mmap(struct file* filp, struct vm_area_struct* vma)
{
// Do not map to userspace using remap_pfn_range(), since it those mappings
// are incompatible with zero-copy for the sendmsg() syscall (due to the
// VM_IO and VM_PFNMAP flags). Instead set up mapping using the fault handler.
vma->vm_ops = &imageaccess_vm_operations;
vma->vm_flags |= VM_MIXEDMAP; // Must be set, or else vmf_insert_page() will fail.
return 0;
}
I have not figured out why virt_to_page()
does not seem to work, if anyone has an idea, feel free to comment. This implementation works in any case.