Search code examples
clinuxframebufferxlibxorg

How to efficiently draw framebuffer content?


I need to display RAM-based framebuffer for a virtual GPU device that doesn't have real display connected to it. What I have is mmap'ed chunk of memory after DRM_IOCTL_MODE_MAP_DUMB in RGB32 format. Currently I'm using MIT-SHM shared pixmap created via XShmCreatePixmap() like this:

shminfo.shmid = shmget(IPC_PRIVATE, bytes, IPC_CREAT|0777);
shminfo.readOnly = False;
shminfo.shmaddr = shmat(shminfo.shmid, 0, 0);
shmctl(shminfo.shmid, IPC_RMID, 0); 
XShmAttach(dpy, &shminfo);
XShmCreatePixmap(dpy, window, shminfo.shmaddr, &shminfo, width, height, 24);

and then simply

while (1) {
    struct timespec ts = {0, 999999999L / 30};

    nanosleep(&ts, NULL);

    memcpy(shminfo.shmaddr, mem, bytes);
    XCopyArea(dpy, pixmap, window, gc, 0, 0, width, height, 0, 0);
    XFlush(dpy);
}

So it loops 30 times per second, doing memcpy followed XCopyArea. The problem is that it uses a lot of CPU: 50% on a powerful machine. Is there any better way? I could think of two possible improvements:

  1. Get rid of memcpy and just pass mmap'ed memory to MIT-SHM but it looks like MIT-SHM API doesn't support this.

  2. Get some kind of 'content changed' notification to get rid of dumb sleeping (but I haven't found anything appropriate).

Any ideas?

Update: Bottleneck is 'memcpy', if removed CPU usage becomes negligible. The problem seems to be that there's no way to share previously mmap'ed memory (if I understood API correctly) so I'm forced to copy whole buffer every time. I've also tried glDrawPixels() and SDL surfaces, both appeared to be even slower than MIT-SHM.

Update: turns out that MIT-SHM isn't well suited for a task like this. It's main purpose is creating buffer and writing (rendering) to it w/o overhead of X IPC. I don't need to write anything but just "forward" existing buffer to X. In this scenario there's no performance difference between shared pixmaps, shared images and regular X images (XCreateImage).

Conclusion: so far I haven't found API that allows rendering existing buffers w/o copying data around every time.


Solution

  • For X11 use XShmCreateImage, write to XImage.data and make visible with XShmPutImage making sure to pass False for send_event parameter. You may also want to disable exposure events for the current GC; setting PointerMotionHintMask can also help.

    SDL1 does most of the above but will use a shadow surface if there is a mismatch between user and display format and may perform unexpected color conversion. SDL2 tries to use hardware acceleration and may perform unexpected scaling and/or filtering. Make sure you're getting what you ask for to avoid hidden ops.

    %50 cpu usage sounds like a lot for this blit at 30fps, I'd rewrite the sleep function as follows just in case.

    do
        errno = 0;
    while ( nanosleep(&ts, &ts) && errno == EINTR );