for (; 1;) {
if (fork() == 0) break;
int sig = 0;
for (; 1; usleep(10000)) {
pid_t wpid = waitpid(g->pid[1], &sig, WNOHANG);
if (wpid > 0) break;
if (wpid < 0) print("wait error: %s\n", strerror(errno));
}
}
waitpid
should return the pid of child process immediately!
But waitpid
got the pid number after about 90 seconds,
cube 28139 0.0 0.0 70576 900 ? Ss 04:24 0:07 ./daemon -d
cube 28140 9.3 0.0 0 0 ? Zl 04:24 106:19 [daemon] <defunct>
strace -p 28139
Process 28139 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
wait4(28140, 0x7fff08a2681c, WNOHANG, NULL) = 0
nanosleep({0, 10000000}, NULL) = 0
wait4(28140, 0x7fff08a2681c, WNOHANG, NULL) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
restart_syscall(<... resuming interrupted call ...>) = 0
wait4(28140, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], WNOHANG, NULL) = 28140
I finally find out there were some fd leaks during deep tracing by lsof.
After fd leaks were fixed, the problem was gone.