I need to display RAM-based framebuffer for a virtual GPU device that doesn't have real display connected to it. What I have is mmap'ed chunk of memory after DRM_IOCTL_MODE_MAP_DUMB in RGB32 format. Currently I'm using MIT-SHM shared pixmap created via XShmCreatePixmap() like this:
shminfo.shmid = shmget(IPC_PRIVATE, bytes, IPC_CREAT|0777);
shminfo.readOnly = False;
shminfo.shmaddr = shmat(shminfo.shmid, 0, 0);
shmctl(shminfo.shmid, IPC_RMID, 0);
XShmAttach(dpy, &shminfo);
XShmCreatePixmap(dpy, window, shminfo.shmaddr, &shminfo, width, height, 24);
and then simply
while (1) {
struct timespec ts = {0, 999999999L / 30};
nanosleep(&ts, NULL);
memcpy(shminfo.shmaddr, mem, bytes);
XCopyArea(dpy, pixmap, window, gc, 0, 0, width, height, 0, 0);
XFlush(dpy);
}
So it loops 30 times per second, doing memcpy followed XCopyArea. The problem is that it uses a lot of CPU: 50% on a powerful machine. Is there any better way? I could think of two possible improvements:
Get rid of memcpy and just pass mmap'ed memory to MIT-SHM but it looks like MIT-SHM API doesn't support this.
Get some kind of 'content changed' notification to get rid of dumb sleeping (but I haven't found anything appropriate).
Any ideas?
Update: Bottleneck is 'memcpy', if removed CPU usage becomes negligible. The problem seems to be that there's no way to share previously mmap'ed memory (if I understood API correctly) so I'm forced to copy whole buffer every time. I've also tried glDrawPixels() and SDL surfaces, both appeared to be even slower than MIT-SHM.
Update: turns out that MIT-SHM isn't well suited for a task like this. It's main purpose is creating buffer and writing (rendering) to it w/o overhead of X IPC. I don't need to write anything but just "forward" existing buffer to X. In this scenario there's no performance difference between shared pixmaps, shared images and regular X images (XCreateImage).
Conclusion: so far I haven't found API that allows rendering existing buffers w/o copying data around every time.
For X11 use XShmCreateImage
, write to XImage.data
and make visible with XShmPutImage
making sure to pass False
for send_event
parameter. You may also want to disable exposure events for the current GC; setting PointerMotionHintMask
can also help.
SDL1 does most of the above but will use a shadow surface if there is a mismatch between user and display format and may perform unexpected color conversion. SDL2 tries to use hardware acceleration and may perform unexpected scaling and/or filtering. Make sure you're getting what you ask for to avoid hidden ops.
%50 cpu usage sounds like a lot for this blit at 30fps, I'd rewrite the sleep function as follows just in case.
do
errno = 0;
while ( nanosleep(&ts, &ts) && errno == EINTR );