Search code examples
linuxmemorylinux-device-drivermmappci

Access PCI memory BAR with low latency (Linux)


Background:

I have a PCI card, which is basically a clock. It gets the time by GPS and saves the current time in a certain register.

Goal:

I want to read a limited number of registers/bytes (for example the current time) over and over again, with the lowest possible latency. (The clock provides very high precision and I think I will loose precision the higher the latency is.). The operating system is RedHat. The programming language is C/C++. I also want to write to the device memory, whereby latency is not an issue.

Possible Ways to go:

I see these ways. If you see another, please tell me:

  1. Writing a Linux kernel module driver, which creates a character device (or one character device for each register to read). Then a user space application can do a "read" on the /dev/ file(s).
  2. DMA
  3. mmap the sysfs resourceX file to user space by a user space application (systemcall). (like here for example)
  4. Write a Linux kernel module driver which implements a mmap file operation.

Questions:

  1. Which is the way with the lowest latency when it comes to the actual reading of the register? I am aware that mmap causes a lot of overhead in the kernel, but as far as I understand that is only for initialisation.
  2. Is way 3 a legit way to go? It looks like a hack to me. How can I determine the /sys/ path automatically from the application?
  3. Is there a difference between way 3 and 4? I am new to PCI driver programming and I think I didn't really understand how way 4 works. I read this (and other chapters of that book), but maybe you can give me a hint or an example. I would appreciate that.

Solution

  • Method 3 or 4 should work fine. There’s no difference between them with respect to latency. Latency would be in the order of 100 ns.

    Method 4 would be needed if you need to initialize the device, or control which applications are allowed to access it, or enforce one reader at a time, etc. Method 3 does seem like a bit of a hack because it skips all of this. But it is simpler if you don’t need such things.

    A character device is definitely higher latency, because it requires a kernel transition each time the device is read.

    The latency of a DMA method depends entirely on how frequently the device writes the time to memory. It is lower latency for the CPU to access memory than MMIO, but if the device only does DMA once a millisecond, then that would be your latency. Also, that method generates a lot of useless DMA traffic, since the CPU would read the value far less often than it is written.