Search code examples
linuxshellfilecronsh

Why can't my Linux shell program sometimes find a file it just created?


I have written a Linux shell program to keep my server information updated. It works by creating a new file with the newest server information and comparing it with the old one. However, the program sometimes can't find the new file.

now=$(cd $(dirname $0);pwd)
while [ ! -s "$now/../var/service.info.temp" ];do
    sh "$now/get_service_info.sh" > "$now/../var/service.info.temp"
done
diff "$now/../var/service.info.temp" "$now/../var/service.info" >> "$now/../var/update.log" 2>> "$now/../var/update.log"
if [ $? -ne 0 ];then
    sh "$now/send_service_info.sh" >> "$now/../var/update.log" 2>> "$now/../var/update.log"
    cp "$now/../var/service.info.temp" "$now/../var/service.info"
fi
rm "$now/../var/service.info.temp"

What I found in update.log (cat is called in send_service_info.sh):

diff: /home/(my user name)/network/sh/../var/service.info.temp: No such file or directory
cat: /home/(my user name)/network/sh/../var/service.info.temp: No such file or directory

The program is run per minute by cron:

# one line of my crontab
* * * * * sh /home/(my user name)/network/sh/check_update.sh

I have run the program by hand many times, but it never throws the error above when I do that. I only found it through the log and wrongly sent emails. The problem does not happen every time, it only happens every 20 minutes to one hour one time when run by cron.

The while loop in the program is newly added to overcome the file problem, but it doesn't work. I still get the same result. I also tried manufacturing delay in get_service_info.sh (by sleep 5s) but it still runs well when I run it by hand. I have simplified get_servive_info.sh to only one command hostname -I, but the problem persists.

My Linux system is Ubuntu 22.04.3 LTS (Linux 4.14.141+) run by UserLand on an Android mobile phone (has not been rooted). I use ssh to connect to it and do my work. I also find my system will be paused if the screen is off between 0 AM and 6 AM, which may link to the problem above (in both cases the cron is unresponsive).

According to suggestions, I have changed the program code to:

set -e

now=$(cd $(dirname $0);pwd)
get="$now/get_service_info.sh"
send="$now/send_service_info.sh"
var="$now/../var"
log="$var/update.log"
origin="$var/service.info"
temp="$var/service.info.temp.$$"

exec >> $log 2>> $log

while [ ! -s $temp ];do
    sh $get > $temp
done
if [[ $(diff $origin $temp) ]];then
    sh $send
    cp $temp $origin
fi
rm $temp

I think I have waited for enough time and it does not throw an error again. That is because of the long execution time. I ignore this because when I test it with an SSH connection, it only takes less than 1 second for that program. Maybe my system will become much slower when it runs automatically, because of the battery-saving mode or some other things. Finally, for readability and robustness here is the code after the change:

set -e

now=$(cd $(dirname $0);pwd)
get="$now/get_service_info.sh"
send="$now/send_service_info.sh"
var="$now/../var"
log="$var/update.log"
origin="$var/service.info"
temp="$var/service.info.temp.$$"

exec >> "$log" 2>> "$log"

while ! test -s "$temp" ; do
    sh "$get" > "$temp"
done
if ! diff -q "$origin" "$temp" > /dev/null ; then
    sh "$send" "$temp"
    cp "$temp" "$origin"
fi
rm "$temp"

Solution

  • If your script takes for whatever reason approximately 1 minute to execute, you have a race condition: A new process will started by cron before the old one is finished, and then the rm old process removes the file perhaps just before the diff of the new process gets executed.

    I suggest that you create a process-specific temp file:

    service_info_temp=$now/../var/service.info.temp.$$
    while [ ! -s "$service_info_temp" ]
    do
      ....