I've run into a strange problem where a huge number of messages from snmplib's snmp_synch_response() are managing to fill up a 60GB hard drive within about three hours. The messages are all "Use snmp_sess_select_info2() for processing large file descriptors", sometimes repeated over a hundred times per line. I'm still working with the customer to figure out how to reproduce this in-house, but I thought I'd ask here in case it was an old issue or, at least, seen by somebody else in some fashion.
Here's the basic system info: 8.1-RELEASE-p2 FreeBSD i386. The NET-SNMP version is 5.5.
Below is a simplified snippet of the key parts of my code. The code first makes a list of tasks with initialized, but not open, sessions. Elsewhere, each task, up to a small limit (64 in this case), is forked and the children open the SNMP session sockets with snmp_open(), and so on. I've scoured each of set(), get(), and getnext(), and am sure that they all call snmp_close() appropriately — there aren't any early returns or other jumps over those calls — so I don't think that I'm explicitly leaking any sockets, but descriptors must be hanging around for some reason. Does this ring any bells for anybody?
for(…){
…
snmp_sess_init(&task->sess_info);
addtask(taskList, task);
…
}
…
for(task = taskList; task && nkids < maxkids; task = task->next){
if(fork() == 0){
set(task);
get(task);
getnext(task);
…
}
nkids++;
}
void set(Task *task){
…
sess = snmp_open(&task->sess_info);
…
pdu = snmp_pdu_create(SNMP_MSG_SET);
…
status = snmp_synch_response(sess, pdu, &resp);
// check return, retr
snmp_close(sess);
}
void get(Task *task){
…
sess = snmp_open(sess_info);
…
pdu = snmp_pdu_create(SNMP_MSG_GET);
…
status = snmp_synch_response(sess, pdu, &resp);
// check return, read variables
snmp_close(sess);
}
void getnext(Task *task){
…
sess = snmp_open(sess_info);
for(obj = task->objs; obj; obj = obj->next){
…
pdu = snmp_pdu_create(SNMP_MSG_GET);
…
status = snmp_synch_response(sess, pdu, &resp);
// check return, read variables
}
snmp_close(sess);
}
In case anybody manages to run into something similar, this (unsurprisingly) ended up not having anything to do with net-snmp. Each child process communicates back to the parent via their own socket. By the basic nature of fork(), the parent's list of sockets was being copied to each child; the solution was simply to close the sockets in this list in the child code.