Search code examples
csocketsclient-serverpolling

Efficient way to handle/define the array of struct pollfd for poll syscall


I am using poll for implementing the server side of a client-server model with multiplexing.

There's a main loop that the server is running.
In each loop I am checking all the fds at the array of struct pollfd until I find n sockets that have content for me (where n is the number that poll returned).
Also, poll is used without timeout (-1 as argument).

So if a new connection is requested at the listening socket I am inserting the socket and some client information into a active connections list I am keeping.

What is the most efficient way to handle the array of poll?

Should I define a small array of size 10 and if there are more clients re allocate memory for an array of size 20 or 50?

Or should I: (a) free the array of struct pollfd type, and (b) reallocate it with size that is equal to the size of the list(+1 for the listening socket) => every time a client closes a connection (and thus I have to remove an element from the array (probably setting the socket to -1, so that poll ignores it) resulting in unused space in array)

What do you recommend?

Thank you for your time


Edit: I think I've found a solution with realloc and shifting with memmove each time a client disconnects to cover his socket at the fds array.


Solution

  • What is the most efficient way to handle the array of poll?

    Implementation-defined. Don't optimise until you know you have to; That's called premature optimisation, and it's a waste of time.

    Should I define a small array of size 10 and if there are more clients re allocate memory for an array of size 20 or 50?

    If your profiler determines that your pollfds are a significant bottleneck in your program, and your boss (or professor) says your program "isn't fast enough", go for it. If the largest your list will get is 50, then just use a static array, set the fd members to -1, don't bother shifting things around when you remove from the array... The bottlenecks you're referring to will be insignificant for such a small number.

    If you're trying to handle much larger numbers of clients as a worst case, you might be concerned about unused space trailing the array when handling smaller numbers... and hence would choose a resizable pollfd array.

    Or should I: (a) free the array of struct pollfd type, and (b) reallocate it with size that is equal to the size of the list(+1 for the listening socket) => every time a client closes a connection (and thus I have to remove an element from the array (probably setting the socket to -1, so that poll ignores it) resulting in unused space in array)

    The simplest algorithm I can think of doubles space on resizing. Perhaps the most significant benefit to doubling on resize is the reduction in calls to realloc compared to linear resizing; Rather than accepting 1024 connections and calling realloc 1024 times, I'd prefer to accept 1024 connections and call realloc 10 or 11 times. It also seems simpler to me because there is no need to store the array capacity separately to the used count; You can use a property of binary numbers to your advantage: (n - 1) & n == 0 when n is a power of two.

    #define is_power_of_two(n) !(((n) - 1) & (n))
    
    struct pollfd *pollfd_array_resize(struct pollfd *array, size_t size) {
        const size_t max_size = SIZE_MAX / sizeof *array;
        if (size == max_size) { return NULL; }
        if (is_power_of_two(size)) {
            array = realloc(array, (size > 0 ? size * 2 : 1) * sizeof *array);
        }
        return array;
    }
    

    Edit: I think I've found a solution with realloc and shifting with memmove each time a client disconnects to cover his socket at the fds array.

    That seems like a fairly good fix. It increases cache locality, but at the cost of having to call memmove and realloc every time a client disconnects.

    I wouldn't even consider "defragmenting" the array until my profiler indicates that poll is using up too much processor time. When that happens, I'd consider setting the fd member to something negative (as you were doing prior to your edit) and putting the index of the item to remove into a recently_freed stack. When inserting, I'd choose items from that stack if possible. If you must defragment, I suggest doing so based upon the size of that stack.

    size_t pollfd_array_defrag(struct pollfd *array, size_t size) {
        size_t new_size = 0;
        for (size_t x = 0; x + 1 < size; x++) {
            if (array[x].fd < 0) {
                continue;
            }
    
            array[new_size++] = array[x];
        }
        return new_size;
    }
    
    int main(void) {
        size_t size = 0, index;
        struct pollfd *array = NULL;
        struct index_stack *recently_freed = NULL;
    
        /*----------------------------*
         * ... snip for insertion ... */
        if (recently_freed && recently_freed->size > 0) {
            index = index_stack_pop(&recently_freed);
        }
        else {
            struct pollfd *temp = pollfd_array_resize(array, size);
    
            if (temp == NULL) {
                /* TODO: Handle memory allocation errors */
            }
    
            array = temp;
            index = size++;
        }
    
        array[index] = (struct pollfd) { .fd = your_fd,
                                         .events = your_events,
                                         .revents = your_revents };
    
        /*--------------------------*
         * ... snip for removal ... */
        index_stack_push(&recently_freed, index);
        array[index] = -1;
    
        /*----------------------------------*
         * ... snip for defragmentation ... */
        if (is_power_of_two(recently_freed->size) && recently_freed->size * 2 + 1 > size) {
            size = pollfd_array_defrag(array);
            /* array will shrink in future insertions */
            index_stack_destroy(&recently_freed);
            recently_freed = NULL;
        }
    }