Search code examples
linuxvideoembeddedfilesystemsreal-time

Bursty writes to SD/USB stalling my time-critical apps on embedded Linux


I'm working on an embedded Linux project that interfaces an ARM9 to a hardware video encoder chip, and writes the video out to SD card or USB stick. The software architecture involves a kernel driver that reads data into a pool of buffers, and a userland app that writes the data to a file on the mounted removable device.

I am finding that above a certain data rate (around 750kbyte/sec) I start to see the userland video-writing app stalling for maybe half a second, about every 5 seconds. This is enough to cause the kernel driver to run out of buffers - and even if I could increase the number of buffers, the video data has to be synchronised (ideally within 40ms) with other things that are going on in real time. Between these 5 second "lag spikes", the writes complete well within 40ms (as far as the app is concerned - I appreciate they're buffered by the OS)

I think this lag spike is to do with the way Linux is flushing data out to disk - I note that pdflush is designed to wake up every 5s, my understanding is that this would be what does the writing. As soon as the stall is over the userland app is able to quickly service and write the backlog of buffers (that didn't overflow).

I think the device I'm writing to has reasonable ultimate throughput: copying a 15MB file from a memory fs and waiting for sync to complete (and the usb stick's light to stop flashing) gave me a write speed of around 2.7MBytes/sec.

I'm looking for two kinds of clues:

  1. How can I stop the bursty writing from stalling my app - perhaps process priorities, realtime patches, or tuning the filesystem code to write continuously rather than burstily?

  2. How can I make my app(s) aware of what is going on with the filesystem in terms of write backlog and throughput to the card/stick? I have the ability to change the video bitrate in the hardware codec on the fly which would be much better than dropping frames, or imposing an artificial cap on maximum allowed bitrate.

Some more info: this is a 200MHz ARM9 currently running a Montavista 2.6.10-based kernel.

Updates:

  • Mounting the filesystem SYNC causes throughput to be much too poor.
  • The removable media is FAT/FAT32 formatted and must be as the purpose of the design is that the media can be plugged into any Windows PC and read.
  • Regularly calling sync() or fsync() say, every second causes regular stalls and unacceptably poor throughput
  • I am using write() and open(O_WRONLY | O_CREAT | O_TRUNC) rather than fopen() etc.
  • I can't immediately find anything online about the mentioned "Linux realtime filesystems". Links?

I hope this makes sense. First embedded Linux question on stackoverflow? :)


Solution

  • For the record, there turned out to be two main aspects that seem to have eliminated the problem in all but the most extreme cases. This system is still in development and hasn't been thoroughly torture-tested yet but is working fairly well (touch wood).

    The big win came from making the userland writer app multi-threaded. It is the calls to write() that block sometimes: other processes and threads still run. So long as I have a thread servicing the device driver and updating frame counts and other data to sychronise with other apps that are running, the data can be buffered and written out a few seconds later without breaking any deadlines. I tried a simple ping-pong double buffer first but that wasn't enough; small buffers would be overwhelmed and big ones just caused bigger pauses while the filesystem digested the writes. A pool of 10 1MB buffers queued between threads is working well now.

    The other aspect is keeping an eye on ultimate write throughput to physical media. For this I am keeping an eye on the stat Dirty: reported by /proc/meminfo. I have some rough and ready code to throttle the encoder if Dirty: climbs above a certain threshold, seems to vaguely work. More testing and tuning needed later. Fortunately I have lots of RAM (128M) to play with giving me a few seconds to see my backlog building up and throttle down smoothly.

    I'll try to remember to pop back and update this answer if I find I need to do anything else to deal with this issue. Thanks to the other answerers.