Search code examples
flushrsyslog

wait synchronously for rsyslog flush to complete


I am running rsyslogd 8.24.0 with a local logfile.

I have a test which runs a program that does some syslog logging (with entries from my test going to another file via rsyslog.conf setting) then exits back to a shell script to check the log has expected content. This usually works but sometimes fails as though the logging hadn't happened. I've added a flush (using HUP signal) to the shell script before it does the check. I can see that the HUP has happened and that the correct entry is in the log, but the script's check still fails. Is there a way for the shell script to wait until the flush has completed? I can add an arbitrary sleep but would prefer to have something more definite.

Here are the relevant bits of the shell script:

# Set syslog to send dump_hook's logging to a local logfile...
sudo echo "user.*   `pwd`/dump_hook_log" >> /etc/rsyslog.conf
sudo systemctl restart rsyslog.service
echo "" > ./dump_hook_log

# run the test program which does syslog logging

kill -HUP `cat /var/run/syslogd.pid` # flush syslog 
if [ $? -ne 0 ]
then 
    logFail "failed to HUP `cat /var/run/syslogd.pid`: $?"
fi
echo "sent HUP to `cat /var/run/syslogd.pid`" 
grep <the string I want> ./dump_hook_log >/dev/null

The string in question is always in the dump_hook_log by the time that the test has reported fail and I've gone to look at it. I presume it must be that the flush hasn't completed by the time of the grep.

Here is an example: In /var/log/messages

2019-01-30T12:13:27.216523+00:00 apx-ont-1 apx_dump_hook[28279]: Failed to open raw dump file "core" (Is a directory)
2019-01-30T12:13:27.216754+00:00 apx-ont-1 rsyslogd: [origin software="rsyslogd" swVersion="8.24.0" x-pid="28185" x-info="http://www.rsyslog.com"] rsyslogd was HUPed

Mod date of the log file (n.b. this is earlier than the entries it contains!):

rw-rw-rw- 1 nealec appexenv1_group 2205 2019-01-30 12:13:27.215053296 +0000 testdir_OPT/dump_hook_log

Last line of the log file (only apx_dump_hook entries in here):

2019-01-30T12:13:27.216523+00:00 apx-ont-1 apx_dump_hook[28279]: Failed to open raw dump file "core" (Is a directory)

Script reporting error:

Wed 30 Jan 12:13:27 GMT 2019 PSE Test 0.2b FAILED: 'Failed to open raw dump file' not found in ./dump_hook_log

Solution

  • I think I understand this now. The HUP causes rsyslogd to close its open files but it doesn’t reopen a file until it needs to log to it. Consider the following: I use inotify to wait for a file to close, like this:

    case 9:
        {
            // Wait for the file, specified in argv[2], to be closed
            int inotfd = inotify_init();           
            if (inotfd < 0) {
                printf("inotify_init failed; errno %d: %s\n",
                    errno, strerror(errno));
                exit(99);
            }
            int watch_desc = inotify_add_watch(inotfd, argv[2], IN_CLOSE);
            if (watch_desc < 0) {
                printf("can't watch %s failed; errno %d: %s\n",
                    argv[2], errno, strerror(errno));
                exit(99);
            }
            size_t bufsiz = sizeof(struct inotify_event) + PATH_MAX + 1;
            struct inotify_event* event = static_cast<inotify_event*>(malloc(bufsiz));
            if (!event) {
                printf("Failed to malloc event buffer; errno %d: %s\n",
                    errno, strerror(errno));
                exit(99);
            }
    
            /* wait for an event to occur with blocking read*/
            read(inotfd, event, bufsiz);
        }
    

    Then in my shell script I wait for that:

    # Start a process that waits for the log file be closed
    ${bin}/test_dump_hook.exe 9 "./dump_hook_log" &  
    wait_pid=$!
    
    # Signal syslogd to cause it it close/reopen its log files
    kill -HUP `cat /var/run/syslogd.pid` # flush syslog 
    if [ $? -ne 0 ]
    then 
        logFail "failed to HUP `cat /var/run/syslogd.pid`: $?"
    fi
    wait $waid_pid
    

    I find this never returns. Sending a HUP to rsyslogd from another process doesn't break it out of the wait either, but a cat (which does open/close the file) of the log file does.

    That’s because the HUP in the shell script was done before the other process waited for it. So the file was already closed at the start of the wait, and because there is no more logging to that file it is not reopened and doesn’t need to close when any subsequent HUPs are received, so the event never occurs to end the wait.

    Having understood this behaviour how can I be sure that the log has been written before I check it? I've gone with this solution; put a known message into the log and wait until that appears, I know that the entries I'm waiting for must be before that. Like this:-

    function flushSyslog
    {
        logger -p user.info -t dump_hoook_test "flushSyslog"
        # Signal syslogd to cause it it close its log file
        kill -HUP `cat /var/run/syslogd.pid` # flush syslog 
        if [ $? -ne 0 ]
        then 
            logFail "failed to HUP `cat /var/run/syslogd.pid`: $?"
        fi
        # wait upto 10 secs for the entry we've just logged to appear
        sleeps=0 
        until     
            grep "flushSyslog" ./dump_hook_log > /dev/null 
        do  
            sleeps=$((sleeps+1))
            if [ $sleeps -gt 100 ]
            then
                logFail "failed to flush syslog dump_hook_log"
            fi
            sleep 0.1
        done        
    }