I have a camera connected to a cortex-A9 OMAP4 board. The video v4l2 frames are allocated in the 3.4 kernel with:
static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
{
struct vb2_dc_buf *buf = buf_priv;
if (!buf) {
printk(KERN_ERR "No buffer to map\n");
return -EINVAL;
}
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
return vb2_mmap_pfn_range(vma, buf->dma_addr, buf->size,
&vb2_common_vm_ops, &buf->handler);
}
I have also tested:
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
I have a complex post-processing assembly Neon-based algorithm running on each frame. It accesses the frame through a standard v4l2 architecture with:
mmap(NULL, buf.length, PROT_READ | PROT_WRITE, MAP_SHARED, camera->fd, buf.m.offset);
Performance of this optimized algorithm is the following:
x ms: user-space malloc allocation of a fake frame (reference)
10*x ms: kernel allocation with pgprot_noncached
4*x ms: kernel allocation with pgprot_writecombine
x ms: kernel allocation with no pgprot call
The problem is that if I don't do any pgprot_*, I have some very strange noise, aka. a few consecutive black pixels randomly in the video. The noise disappears upon some specific circumstances when all allocated memory ranges are accessed.
Last, if I simply do a memcpy while memory has been allocated with the original pgprot_noncached, there doesn't seem to be any performance issue but I can't afford to add a memcpy.
How can I fix this situation, aka. get a kernel memory allocation without any noise and that is as good as a user-space malloc.
The neon code does a lot of vld1.u8 and vst1.u8 with different increments.
For reference, the solution was to invalidate and flush the memory region (outer_inv_range and outer_flush_range).