Search code examples
linuxbashamazon-s3zipinotifywait

Can inotifywait say a file is closed before it is accessible to another process?


I have a system where one bash script is creating zip files in a given directory every few seconds. I have another bash script which is using inotifywait to check for these .zip files appearing, and uploads them to an Amazon S3 bucket using wget. This seems to generally work well.

However, very occasionally, the wget will fail, complaining that the .zip file doesn't exist. It seems that inotifywait is reporting that the file is present, while the file is not yet ready to be opened. I'm using the close_write event to detect when the file is present.

Here is the script that is waiting for the files, and uploading them:

#!/bin/bash

# Watches for new files appearing in a zipfiles directory, and uploads them
# to AWS.

# Process a single .txt file
# $1 = filename
# $2 = the number contained in the filename

process_zip_file( )
{
    wget --no-check-certificate \
         -O /dev/null \
         --method PUT \
         --timeout=0 \
         --header 'Content-Type: application/zip' \
         --body-file=$1 \
         https://[redacted AWS endpoint]/${1}

    if [[ $? -eq 0 ]]; then
        mv $1 $UPLOADED/
    fi
}

# Wait for CLOSE_WRITE events in the data directory, and extract the results
# into an array. aline[0] is the path, [1] is the event(s), [2] is the filename

inotifywait -m -e close_write $WATCHDIR | while read -a aline; do
    fname=${aline[2]}

    # Check it is of the form zip-XYZ.zip where XYZ is a number
    if [[ $fname =~ ^zip\-([[:digit:]]*)\.zip$ ]]; then
        process_zip_file $fname ${BASH_REMATCH[1]}
    fi
done

Here is the error message I occasionally get:

--2023-02-09 23:21:25--  https://[redacted AWS endpoint]/zip-00002331.zip
BODY data file 'zip-00002331.zip' missing: No such file or directory
--2023-02-09 23:21:33--  https://[redacted AWS endpoint]/zip-00002332.zip
Resolving [redacted AWS endpoint... [redacted IP addresses]
Connecting to [redacted AWS endpoint]|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [application/json]
Saving to: '/dev/null'

     0K                                                        0.00 =0s
                                   0.00 =0s

This shows a failure at 23:21:25 (No such file or directory), followed by a successful upload ~8 seconds later.

Can anyone explain what could be going on here?

I thought one workaround might be to sleep for a second before trying to upload the zip file. But who can say if 1 second is the right number? And I don't really want to insert arbitrary delays into the pipeline without understanding what is going on.

OS is Yocto running on an Nvidia Jetson. Kernel 4.9.253-l4t-r32.7. Filesystem is ext4.

Update: Adding details of how the .zip file is created.

The bash script which creates these .zip files is itself using inotifywait to check for a .txt file appearing (which is created from a C++ program). When a .txt file appears, it makes a zip file containing the .txt file and three .jpg files.

#!/bin/bash

# Watches for new files appearing in a data directory, and once a
# .txt file and three image files exist, zips them up into a zip file.

# Process a single .txt file
# $1 = filename
# $2 = the number contained in the filename

process_txt_file( )
{
    # Check if photos are all present
    if [[ -f pp-${2}-0.jpg && -f pp-${2}-1.jpg && -f pp-${2}-2.jpg ]] ; then
        echo "Got all the parts for $2"

        # Create the zip file
        zip -r zipfiles/zip-${2}.zip \
            $1 pp-${2}-0.jpg pp-${2}-1.jpg pp-${2}-2.jpg

        if [[ $? -eq 0 ]]; then
            rm $1 pp-${2}-0.jpg pp-${2}-1.jpg pp-${2}-2.jpg
        fi
    else
        echo "Not got all the parts for $2"
    fi
}

# Wait for CLOSE_WRITE events in the data directory, and extract the results
# into an array. aline[0] is the path, [1] is the event(s), [2] is the filename

inotifywait -m -e close_write $DATA_DIR | while read -a aline; do
    fname=${aline[2]}

    # Check it is of the form pb-XYZ.txt where XYZ is a number
    if [[ $fname =~ ^pb\-([[:digit:]]*)\.txt$ ]]; then
        process_txt_file $fname ${BASH_REMATCH[1]}
    fi
done

The log of this script when the error occurs looks normal:

Got all the parts for 00002330
  adding: pb-00002330.txt (deflated 49%)
  adding: pp-00002330-0.jpg (deflated 4%)
  adding: pp-00002330-1.jpg (deflated 4%)
  adding: pp-00002330-2.jpg (deflated 3%)
Got all the parts for 00002331
  adding: pb-00002331.txt (deflated 49%)
  adding: pp-00002331-0.jpg (deflated 2%)
  adding: pp-00002331-1.jpg (deflated 3%)
  adding: pp-00002331-2.jpg (deflated 2%)
Got all the parts for 00002332
  adding: pb-00002332.txt (deflated 49%)
  adding: pp-00002332-0.jpg (deflated 2%)
  adding: pp-00002332-1.jpg (deflated 2%)
  adding: pp-00002332-2.jpg (deflated 3%)

Solution

  • I think I may have found the answer, by running zip via strace. It seems that zip initially creates the destination file, and then immediately deletes it. Then it creates a temporary file with the correct contents, and renames it to your chosen destination filename.

    I have no idea why it does this. Example, using zip -r /home/root/foo.zip blah.jpg...

    openat(AT_FDCWD, "/home/root/foo.zip", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
    close(3)                                = 0
    newfstatat(AT_FDCWD, "/home/root/foo.zip", {st_mode=S_IFREG|0644, st_size=0, ...}, 0) = 0
    unlinkat(AT_FDCWD, "/home/root/foo.zip", 0) = 0
    openat(AT_FDCWD, "/home/root/ziw022vF", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
    // Lots of zipping going on
    close(3)                                = 0
    newfstatat(AT_FDCWD, "/home/root/foo.zip", 0x7ff72948b8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
    renameat(AT_FDCWD, "/home/root/ziw022vF", AT_FDCWD, "/home/root/foo.zip") = 0
    

    So it seems my script can be notified that the file exists, and tries to access it during the time between when it was deleted by zip, and when the temp file gets renamed.

    I'll work around this by creating the zip files in a temp folder, and renaming them into the main data folder (where the inotifywait is looking for them).

    As @chrslg helpfully commented, a solution is to just watch for "moved_to" events, rather than "close_write", as they will catch the rename from the temp file to the real .zip file.