Search code examples
cfreebsdnet-snmp

File descriptor leak leads to error message flood from net-snmp. Sound familiar?


I've run into a strange problem where a huge number of messages from snmplib's snmp_synch_response() are managing to fill up a 60GB hard drive within about three hours. The messages are all "Use snmp_sess_select_info2() for processing large file descriptors", sometimes repeated over a hundred times per line. I'm still working with the customer to figure out how to reproduce this in-house, but I thought I'd ask here in case it was an old issue or, at least, seen by somebody else in some fashion.

Here's the basic system info: 8.1-RELEASE-p2 FreeBSD i386. The NET-SNMP version is 5.5.

Below is a simplified snippet of the key parts of my code. The code first makes a list of tasks with initialized, but not open, sessions. Elsewhere, each task, up to a small limit (64 in this case), is forked and the children open the SNMP session sockets with snmp_open(), and so on. I've scoured each of set(), get(), and getnext(), and am sure that they all call snmp_close() appropriately — there aren't any early returns or other jumps over those calls — so I don't think that I'm explicitly leaking any sockets, but descriptors must be hanging around for some reason. Does this ring any bells for anybody?

for(…){
    …
    snmp_sess_init(&task->sess_info);
    addtask(taskList, task);
    …
}

…

for(task = taskList; task && nkids < maxkids; task = task->next){
    if(fork() == 0){
        set(task);
        get(task);
        getnext(task);
        …
    }
    nkids++;
}

void set(Task *task){
    …
    sess = snmp_open(&task->sess_info);
    …
    pdu = snmp_pdu_create(SNMP_MSG_SET);
    …
    status = snmp_synch_response(sess, pdu, &resp);
    // check return, retr
    snmp_close(sess);
}

void get(Task *task){
    …
    sess = snmp_open(sess_info);
    …
    pdu = snmp_pdu_create(SNMP_MSG_GET);
    …
    status = snmp_synch_response(sess, pdu, &resp);
    // check return, read variables
    snmp_close(sess);
}

void getnext(Task *task){
    …
    sess = snmp_open(sess_info);
    for(obj = task->objs; obj; obj = obj->next){
        …
        pdu = snmp_pdu_create(SNMP_MSG_GET);
        …
        status = snmp_synch_response(sess, pdu, &resp);
        // check return, read variables
    }
    snmp_close(sess);
}

Solution

  • In case anybody manages to run into something similar, this (unsurprisingly) ended up not having anything to do with net-snmp. Each child process communicates back to the parent via their own socket. By the basic nature of fork(), the parent's list of sockets was being copied to each child; the solution was simply to close the sockets in this list in the child code.