Search code examples
linuxbashwaitnohup

Checking for successful launch via nohup


On one of the servers, I have a script in which at one of the stages tcpdump is sent to nohup.

start_dump() {
    2>&1 /usr/bin/nohup /usr/sbin/tcpdump -s 0 -i $IFACE host $HOST -C 1000 -w $DUMP_DIR/$LOGIN/$DATE\_$HOST.pcap | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0; fflush(); }' >> /var/log/dump/nohup_$LOGIN.out &
}

I need to make sure everything went well and the dump is being written. To do this, I check if the process exists in ps, but in some cases I get an error even though the process exists there.

dump_check() {
    ps u -C tcpdump | grep $HOST > /dev/null
}

For debugging, I made a cycle of checks, as it seemed to me the reason was that the dump did not have time to start before checking the condition.

start_dump() {
    2>&1 /usr/bin/nohup /usr/sbin/tcpdump -s 0 -i $IFACE host $HOST -C 1000 -w $DUMP_DIR/$LOGIN/$DATE\_$HOST.pcap | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0; fflush(); }' >> /var/log/dump/nohup_$LOGIN.out &
}

dump_check_check() {
    ps u -C tcpdump | grep $HOST
    echo $?
}

...

                start_dump
                for run in {1..10}; do
                    dump_check_check
                done

And apparently I was right. This is what I get:

+ start_dump
+ for run in {1..10}
+ dump_check_check
+ grep 172.x.x.x
+ ps u -C tcpdump
+ awk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0; fflush(); }'
+ /usr/bin/nohup /usr/sbin/tcpdump -s 0 -i ppp0 host x.x.x.x -C 1000 -w /root/dumps/xxxx/2021-01-21_17:31:51_172.19.5.234.pcap
+ echo 1
1
+ for run in {1..10}
+ dump_check_check
+ grep 172.x.x.x
+ ps u -C tcpdump
+ echo 1
1
+ for run in {1..10}
+ dump_check_check
+ grep 172.x.x.x
+ ps u -C tcpdump
root       768  0.0  0.0  10020  1468 pts/0    D+   17:31   0:00 /usr/sbin/tcpdump -s 0 -i ppp0 host 172.x.x.x -C 1000 -w /root/dumps/xxxx/2021-01-21_17:31:51_172.19.5.234.pcap
+ echo 0
0

Firstly, the dump itself starts executing after checking the condition, why? Secondly, even after the launch, the next check of the condition is also not successful, as I understand it, due to the fact that the command is sent to nohup and the dump does not have time to start before the next check. Well, the third time everything works.

Question: it seems that the solution to this is to add a delay before checking the condition, but sleep does not suit me because sometimes the check is performed the first time, and sometimes the fifth time. I can't just waste so much time, it's critical for me. I am looking for a solution where the success check will run multiple times before success, but no longer than a specific time. If this time expires, an error should appear.

ps I hope I haven't overdone the details. This is my first question here. Thanks in advance, friends!


Solution

  • I am looking for a solution where the success check will run multiple times before success, but no longer than a specific time. If this time expires, an error should appear.

    You can always use something like this:

    check_dump()
    {
        for run in {1..10}
        do  sleep .1
            ps u -C tcpdump | grep $HOST && return 0
        done
        return 1
    }
    
    start_dump
    if check_dump; then echo SUCCESS; else echo ERROR; fi
    

    This will run no longer than about one second (the time of ps | grep should be negligible). You can adjust the maximum number of checks and the interval between them at will.