Pclose seems to make process fail

This question is a follow up of this question : Controlling a C daemon from another program

My goal is to control daemon process execution from another program.
The daemon's code is really simple.

int main()
{
  printf("Daemon starting ...\n");
  openlog("daemon-test", LOG_PID, LOG_DAEMON);

  syslog(LOG_INFO, "Daemon started !\n");

  while(1)
  {
    syslog(LOG_INFO, "Daemon alive - pid=%d, pgid=%d\n", getpid(), getpgrp());
    sleep(1);
  }

  return EXIT_SUCCESS;
}

I have implemented a SystemV init script for this daemon as follow

#!/bin/sh

NAME=daemon-test
DAEMON=/usr/bin/${NAME}
SCRIPTNAME=/etc/init.d/${NAME}
USER=root
RUN_LEVEL=99
PID_FILE=/var/run/${NAME}.pid
RETRY=3

start_daemon()
{
    start-stop-daemon --start --background --name ${NAME} --chuid ${USER} --nicelevel ${RUN_LEVEL} --make-pidfile --pidfile ${PID_FILE} --exec ${DAEMON}
    ret=$?

    if [ "$ret" -eq 0 ]; then
        echo "'${NAME}' started"
    elif [ "$ret" -eq 1 ]; then
        echo "'${NAME}' is already running"
    else
        echo "An error occured starting '${NAME}'"
    fi
    return ${ret}
}

stop_daemon()
{
    start-stop-daemon --stop --retry ${RETRY} --remove-pidfile --pidfile ${PID_FILE} --name ${NAME} --signal 9
    ret=$?

    if [ "$ret" -eq 0 ]; then
        echo "'${NAME}' stopped"
    elif [ "$ret" -eq 1 ]; then
        echo "'${NAME}' is already stopped"
    elif [ "$ret" -eq 2 ]; then
        echo "'${NAME}' not stopped after ${RETRY} tries"
    else
        echo "An error occured stopping '${NAME}'"
    fi
    return ${ret}
}

status_daemon()
{
    start-stop-daemon --status --pidfile ${PID_FILE} --name ${NAME}
    ret=$?

    if [ "$ret" -eq 0 ]; then
        echo "'${NAME}' is running"
    elif [ "$ret" -eq 1 ]; then
        echo "'${NAME}' stopped but pid file exits"
    elif [ "$ret" -eq 3 ]; then
        echo "'${NAME}' stopped"
    elif [ "$ret" -eq 4 ]; then
        echo "Unable to get '${NAME}' status"
    else
        echo "Unknown status : ${ret}"
    fi
    return ${ret}
}

case "$1" in
  start)
    echo "Starting '${NAME}' deamon :"
    start_daemon
    ;;
  stop)
    echo "Stopping '${NAME}' deamon :"
    stop_daemon
    ;;
  status)
    echo "Getting '${NAME}' deamon status :"
    status_daemon
    ;;
  restart|reload)
    "$0" stop
    "$0" start
    ;;
  *)
    echo "Usage: $0 {start|stop|status|restart}"
    exit 1
esac

exit $?

Using this script from command line to control the daemon execution works well.

So the aim now is to use this script from another c program to launch the daemon and to control its execution from this program.

I have implemented a simple C program which:

Launch the script with 'start' argument
Wait for pid file creation
Read daemon's pid from pid file
Periodically check that daemon is alive by checking existence of file /proc/<daemon_pid>/exec
If daemon is killed, relaunch it

And here is the issue I'm facing. The program works well only if I don't call pclose.

Here is the code of the program

#define DAEMON_NAME       "daemon-test"
#define DAEMON_START_CMD  "/etc/init.d/" DAEMON_NAME " start"
#define DAEMON_STOP_CMD   "/etc/init.d/" DAEMON_NAME " stop"
#define DAEMON_PID_FILE   "/var/run/" DAEMON_NAME ".pid"

int main()
{
    char daemon_proc_path[256];
    FILE* daemon_pipe = NULL;
    int daemon_pid = 0;
    FILE* fp = NULL;
    int ret = 0;
    int i = 0;

    printf("Launching '%s' program\n", DAEMON_NAME);
    if(NULL == (daemon_pipe = popen(DAEMON_START_CMD, "r")))
    {
        printf("An error occured launching '%s': %m\n", DAEMON_START_CMD);
        return EXIT_FAILURE;
    }
    #ifdef USE_PCLOSE
    else if(-1 == (ret = pclose(daemon_pipe)))
    {
        printf("An error occured waiting for '%s': %m\n", DAEMON_START_CMD);
        return EXIT_FAILURE;
    }
    #endif
    else
    {
        printf("Script exit status : %d\n", ret);

        while(0 != access(DAEMON_PID_FILE, F_OK))
        {
            printf("Waiting for pid file creation\n");
            sleep(1);
        }
        if(NULL == (fp = fopen(DAEMON_PID_FILE, "r")))
        {
            printf("Unable to open '%s'\n", DAEMON_PID_FILE);
            return EXIT_FAILURE;
        }
        fscanf(fp, "%d", &daemon_pid);
        fclose(fp);
        printf("Daemon has pid=%d\n", daemon_pid);
        sprintf(daemon_proc_path, "/proc/%d/exe", daemon_pid);
    }

    while(1)
    {
        if(0 != access(daemon_proc_path, F_OK))
        {
            printf("\n--- Daemon (pid=%d) has been killed ---\n", daemon_pid);
            printf("Relaunching new daemon instance...\n");
            if(NULL == (daemon_pipe = popen(DAEMON_START_CMD, "r")))
            {
                printf("An error occured launching '%s': %m\n", DAEMON_START_CMD);
                return EXIT_FAILURE;
            }
            #ifdef USE_PCLOSE
            else if(-1 == (ret = pclose(daemon_pipe)))
            {
                printf("An error occured waiting for '%s': %m\n", DAEMON_START_CMD);
                return EXIT_FAILURE;
            }
            #endif
            else
            {
                printf("Script exit status : %d\n", ret);

                while(0 != access(DAEMON_PID_FILE, F_OK))
                {
                    printf("Waiting for pid file creation\n");
                    sleep(1);
                }
                if(NULL == (fp = fopen(DAEMON_PID_FILE, "r")))
                {
                    printf("Unable to open '%s'\n", DAEMON_PID_FILE);
                    return EXIT_FAILURE;
                }
                fscanf(fp, "%d", &daemon_pid);
                fclose(fp);
                printf("Daemon has pid=%d\n", daemon_pid);
                sprintf(daemon_proc_path, "/proc/%d/exe", daemon_pid);
            }
        }
        else
        {
            printf("Daemon alive (pid=%d)\n", daemon_pid);
        }
        sleep(1);
    }

    return EXIT_SUCCESS;
}

From what I understood pclose is supposed to wait for child process termination and only when the child process has returned, it closes the pipe.

So I don't understand why my implementation with pclose doesn't work when it works without calling it.

Here are the logs with and without the pclose block commented

Without pclose calling:

# ./popenTest 
Launching 'daemon-test' program
Script exit status : 0
Waiting for pid file creation
Daemon has pid=435
Daemon alive (pid=435)
Daemon alive (pid=435)
Daemon alive (pid=435)
Daemon alive (pid=435)

With pclose calling:

# ./popenTest 
Launching 'daemon-test' program
Script exit status : 36096
Waiting for pid file creation
Waiting for pid file creation
Waiting for pid file creation
Waiting for pid file creation

As you can see, the daemon is never launched and the pid file is never created neither.

Even if my program works without pclose I would like to understand the underlying issue with the call to pclose.

Why using pclose makes the program fail when the behaviour is good without calling it ?

EDIT:

Here are some more information for the error case

errno is Success
WIFEXITED macro returns true
WEXITSTATUS macro returns 141

By going further into debugging, I have remarqued that modifying the init script to log output to a file makes it work... why ?

Solution

You use popen(DAEMON_START_CMD, "r"). That means your 'daemon watcher' is reading the standard output of your 'daemon starter' script. If you pclose() that pipe, the script writes to standard output and gets a SIGPIPE because the read end of the pipe is closed. Whether that happens before the actual daemon is started or not is open to debate — and timing issues.

Don't pclose() the pipe until you know the daemon starter has exited, by some means or other. Personally, I'd use pipe(), fork() and execv() (or some other variant of the exec family of functions directly. I don't think popen() is the right tool for the job. But if you're going to use popen(), then read the input until you get no more (EOF), then use pclose() safely. You don't have to print what you read, though it would be conventional and sensible to do so — the 'daemon starter' script is telling you useful information.

The classic way to check whether a process ID is still running is to use kill(daemon_pid, 0). If the process executing that is appropriately privileged (same UID as the process, or root privileges), this works. It won't help if you can't send an active signal to the PID.

(I assume start-stop-daemon is a program, probably a C program rather than a shell script, that launches another program as a daemon. I have a similar program that I call daemonize — and it too is intended for convert programs not specifically designed as daemons into a program running as a daemon. Many programs don't work well as daemons — consider what daemonizing ls, grep, ps, or sort would mean. Other programs can more sensibly be run as daemons.)