Search code examples
linuxshellshlsof

How to identify if file is still written or completed through linux script


We have one system which is generating files and I want to check which files out of many files are completed and also not been modified from past two minutes through a script and finally rename those.

This is what I tried but the result is not correct. Could someone help?

for file in /home/test/*abc_YYYYMMDDhhmmss*
do
    f1=`basename $file`
    if [ lsof | grep "$f1" = "" ];then
        if  [ `stat --format=%Y $file` -le $(( `date +%s` - 300 )) ]; then
        mv "$f1" "${f1}_Complete"
    else
       echo "no files to collect"
    fi
done

Solution

  • You are making the common mistake of assuming that [ is part of the if command's syntax; but it's not: [ is just another command. The syntax for an if statement is

    if commands; then
        : what to do if the exit code from commands was 0
    else
        : what to do if not
    fi
    

    where commands can be an arbitrarily complex sequence of commands, and the exit code from the last command in the sequence decides which branch to take; and the else branch is optional.

    As a minimal fix, change to

        # use modern $(command substitution) syntax
        # instead of obsolescent `command substitution`;
        # always quote variables with file names
        f1=$(basename "$file")
        # Remove [ and switch to grep -q;
        # add -F to grep flags for literal matching
        if ! lsof | grep -Fq "$f1"; then
    

    Anyway, what about something like this instead?

    find $(lsof |
        awk 'NR==FNR { if ($9 ~ /^\/home\/test\//) a[$9]++; next }
        FNR == 1 {
            if (! (FILENAME in a)) print FILENAME;
            next }' - /home/test/*abc_YYYYMMDDhhmmss*) \
        -type f -mmin +2 -exec sh -c '
            for file; do
                mv "$file" "${file}_Complete"
            done' _ {} +
    

    This is pretty complex, but here's a rundown.

    • lsof | awk ... prints out the files which are not open from the wildcard matches.
      • This assumes that the files are regular text files - some Awk variants have trouble with binary input files. It would probably not be too hard to refactor this to avoid this constraint if it's proplematic.
      • In some more detail, the first argument to Awk is - i.e. standard input, which reads the pipe from lsof. The condition NR==FNR is true for the first input file; we simply collect the open files into the associative array a. Then the second condition prints the name of the current input file if it's not in the array; this is executed for the remaining input files, i.e. those which match the wildcard.
    • This is passed as the paths for find to examine; it will look for any files modified in the last two minutes, and pass the result to the command in -exec.
    • The simple shell script in -exec should be easy to understand. find passes the found files as command-line arguments, but sh -c fills them from $0 so we pass in a dummy _ to push the file names into $1, $2 etc which is what for loops over if you don't give it a list of arguments.

    This will probably not work if your file names contain newlines; then you'll need something more complex still.

    Looping over arbitrary file names is disappointingly complex in Bourne-family shells, and finding elements not in a list is always slightly pesky in shell script. Ksh and Bash offer some relief because they have arrays, but this is not portable to POSIX sh / ash / dash.