Search code examples
bash

How to delete images in folder based on filename conditions?


I need to delete JPG and jpg files based on the following conditions

Folder has multiple JPG and jpg files. Each file is named for example: 172.30.165.212_20241231_132125.JPG. Where 172.30.165.212 is the IP address, 20241231 is the date in YYYYMMDD format, and 132125 is the time in HHMMSS format.

The delete conditions are:
1- The script should always keep the most recent file per IP address based on its date/time from filename. No matter how old the date/time is.
2- But, since each IP address can have multiple files, the script should delete all files whose date/time in the filename is more than 2 hours older than the current time.
3- Never look at the file's modification date/time, only the one found in the name.

I have tried this with no luck as files dont get deleted.

#!/bin/bash
# Define target directory
TARGET_DIR="/mnt/moe/results"
# Log start time
echo "[$(date)] Starting cleanup process in $TARGET_DIR"

# Function to process files for a single IP
process_ip_files() {
    local ip_prefix=$1
    local ip_files
    # Find files matching the IP
    ip_files=$(find "$TARGET_DIR" -type f -iname "${ip_prefix}_*" | sort)

    # Skip if no files
    if [[ -z "$ip_files" ]]; then
        echo "No files found for IP: $ip_prefix"
        return
    fi

    echo "Processing IP: $ip_prefix"

    # Variables to track files and the most recent file
    local most_recent_file=""
    local most_recent_time=0
    local files_to_delete=()

    # Get current time in seconds since epoch
    current_time=$(date +%s)

    # Iterate over files to determine the most recent and deletion criteria
    while IFS= read -r file; do
        echo "Processing file: $file"

        # Remove the path and get just the file name
        base_file=$(basename "$file")
        echo "Base file name: $base_file"

        # Split file name into components
        IFS='_' read -r ip date time ext <<< "$base_file"
        
        # Validate the expected number of fields and format
        if [[ -z "$ip" || -z "$date" || -z "$time" || "$ext" != "JPG" && "$ext" != "jpg" ]]; then
            echo "  Skipping file (does not match expected format): $file"
            continue
        fi

        # Check the timestamp format (YYYYMMDD HHMMSS)
        if ! [[ "$date" =~ ^[0-9]{8}$ ]] || ! [[ "$time" =~ ^[0-9]{6}$ ]]; then
            echo "  Skipping file (invalid timestamp format): $file"
            continue
        fi

        # Convert to seconds since epoch
        timestamp="$date $time"
        file_time=$(date -d "$timestamp" +%s)

        echo "  File: $file"
        echo "    Timestamp: $timestamp"
        echo "    File time (epoch): $file_time"
        echo "    Current time (epoch): $current_time"

        # Check if this file is the most recent one for the IP
        if (( file_time > most_recent_time )); then
            # If we already have a most recent file, we add it to the delete list
            if [[ -n "$most_recent_file" ]]; then
                files_to_delete+=("$most_recent_file")
            fi
            most_recent_file="$file"
            most_recent_time="$file_time"
        else
            # Check if the file is older than 2 hours (7200 seconds)
            if (( current_time - file_time > 7200 )); then
                echo "    Marking for deletion: $file"
                files_to_delete+=("$file")
            fi
        fi
    done <<< "$ip_files"

    # Display the most recent file for this IP
    echo "Most recent file for IP $ip_prefix: $most_recent_file"

    # Deleting files not the most recent one
    if [[ ${#files_to_delete[@]} -gt 0 ]]; then
        echo "Files marked for deletion for IP $ip_prefix:"
        for file in "${files_to_delete[@]}"; do
            echo "  - $file"
        done

        for file in "${files_to_delete[@]}"; do
            if [[ "$file" != "$most_recent_file" ]]; then
                echo "Deleting file: $file"
                rm -v "$file"
            fi
        done
    else
        echo "No files to delete for IP $ip_prefix."
    fi
}

# Process unique IP addresses
find "$TARGET_DIR" -type f \( -iname "*.jpg" -o -iname "*.JPG" \) -printf "%f\n" | \
    awk -F'_' '{print $1}' | sort -u | while read -r ip; do
    process_ip_files "$ip"
done

# Log completion
echo "[$(date)] Cleanup process finished."

As an example, I have the following files and current date/time is 20241231 13:30

172.30.165.212_20241231_132125.JPG  
172.30.165.212_20241231_122125.JPG  
172.30.165.212_20241231_112125.JPG  
172.30.165.212_20241231_102125.JPG  
172.30.165.212_20241231_092125.JPG  
172.30.165.213_20241231_062125.JPG  
172.30.165.213_20241231_032125.JPG  
172.30.165.213_20241231_012125.JPG  

Script should delete

172.30.165.212_20241231_112125.JPG (older than 2 hours)  
172.30.165.212_20241231_102125.JPG (older than 2 hours)  
172.30.165.212_20241231_092125.JPG (older than 2 hours)  
172.30.165.213_20241231_032125.JPG (older than 2 hours)  
172.30.165.213_20241231_012125.JPG (older than 2 hours)  

Script should keep

172.30.165.212_20241231_132125.JPG  (younger than 2 hours)  
172.30.165.212_20241231_122125.JPG  (younger than 2 hours)  
172.30.165.213_20241231_062125.JPG  (older than 2 hours but most recent from this ip address)  

Solution

  • Instead of trying to critique a 100+ line script I propose the following alternative:

    $ cat delfiles
    #!/bin/bash
    
    TARGET_DIR="/mnt/moe/results"                                # OP's directory; update accordingly (eg, TARGET_DIR='.' in my case)
    
    unset prev_ip
    
    printf -v now "%(%s)T"                                       # get current time in epoch format
    
    now=1735673400                                               # hardcoded to OP's 'current date' of '2024-12-31 13:30:00';
                                                                 # otherwise comment/remove this line for normal operations
    
    (( now-=7200 ))                                              # subtract 2 hours
    
    while read -r fname
    do
        IFS='_' read -r ip dt tm ext <<< "${fname}"
    
        [[ "${ip}" != "${prev_ip}" ]] && {                       # if new ip then this is the latest file for said ip so ...
            prev_ip="${ip}"                                      # save the new ip and ...
            continue                                             # skip to next file (ie, keep this file)
        }
    
        epoch=$(date -d "${dt:0:4}-${dt:4:2}-${dt:6:2} ${tm:0:2}:${tm:2:2}:${tm:4:2}" '+%s')
    
        (( epoch < now )) && echo rm "${TARGET_DIR}/${fname}"    # if file's epoch is more than 2 hrs old then remove the file;
                                                                 # NOTE: remove the 'echo' to perform the actual deletion
    
    done < <(find "${TARGET_DIR}" -type f -iname '*.jpg' -printf "%f\n" | sort -rV)
    

    NOTES:

    • we sort the find results by -rV to sort in reverse order using a Version sort of the ip + date/timestamp
    • OP can add extra checks (eg, file name matches a given format) and informational messages as needed
    • OP can improve performance a bit by reducing the volume of $(date -d ... '+%s') calls with something like this answer's coproc solution

    Running against OP's file set generates:

    $ ./delfiles
    rm ./172.30.165.213_20241231_032125.JPG
    rm ./172.30.165.213_20241231_012125.JPG
    rm ./172.30.165.212_20241231_112125.JPG
    rm ./172.30.165.212_20241231_102125.JPG
    rm ./172.30.165.212_20241231_092125.JPG