Search code examples
c++multithreadingasynchronousboost-asioproducer-consumer

boost::async_write large files and memory consumption


I'm writing an Http Server using boost::asio. For large files, in order to avoid reading the whole file into memory and sending it to the network, I read it part by part which I send on the network using boost::asio::async_write.

The problem is that my producer (the function that reads from the files) is much faster than the consumer (boost::asio::async_write), which leads to a huge memory consumption for big files.

I want to avoid this problem by limiting the list of buffers. It seems like a simple producer/consumer problem, however, I don't want to block a thread while doing so.

I use boost::io_service with a thread pool of n threads which is configurable and in case we have too many requests on large files I don't want to end up with a server not serving any request anymore.

So my question is: - How can I design this mechanism without blocking a thread? - Should I test the list size and then in case it is already too large, spawn a deadline timer that will do a io_service::post and continue reading my file ? - Is there a better way to handle that ?


Solution

  • blocking the reading thread is not a good idea, if you want to prevent a dos attack. What you try to avoid is to allocate a too high amount of ressources (memory) at the same time. But the amount of open file streams is limited as well. If you start blocking the reading threads in an overload situation, you can get a very high number of open filestreams very fast. If you catch the error your program may not crash, but it is undesireable behaviour for sure, since you are unable to open other files (e.g. log files).

    To prevent this problem you have to care for both resources. There are many algorithms to limit the amount of allocated resources. For memory you could use a ringbuffer for the chunks of read data. You can also use atomic counters, to keep track of the amount of allocated resources and establish an upper bound. Semaphores can also be used to solve this kind of problem. I would prefer the last one. Pseudo Code would look like this.

    Semaphore filestreams(maxNumberOfFilestreams);
    Semaphore memory(maxNumberOfAllocatedChunks);
    
    // Worker thread to read
    void run() {
        filestream.wait();
        while(!eof) {
            memory.wait();
            // Allocate and read
        }
        file.close();
        filestream.notify()
    }
    
    // Sending thread()
    
    void run() {
        while(true) {
            // grab chunk, send and free memory
            memory.notify();
        }
    }
    

    Keep in mind, that open tcp connections are a limited resource too.