I am trying to filter out the logs which are 7 days older than the newest log, for example, if the last section of log is 2024-02-13 then all the logs on 2024-02-05 would be removed.
The example log file:
* Server Name: myserver
* Date and Time: 2024-02-05 23:00:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30
* Server Name: myserver
* Date and Time: 2024-02-05 23:30:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30
* Server Name: myserver
* Date and Time: 2024-02-06 00:00:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30
.
.
.
* Server Name: myserver
* Date and Time: 2024-02-13 23:00:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30
* Server Name: myserver
* Date and Time: 2024-02-13 23:30:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30
Expected output:
* Server Name: myserver
* Date and Time: 2024-02-06 00:00:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30
.
.
.
* Server Name: myserver
* Date and Time: 2024-02-13 23:00:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30
* Server Name: myserver
* Date and Time: 2024-02-13 23:30:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30
I did try to use awk command, but it didn't work as what I want, and it just clear the whole log file.
My code is as below:
#!/bin/bash
LOG_FILE="logs/worker_count.log"
TMP_FILE="$LOG_FILE.tmp"
# Extract the newest and oldest dates from the log file
newest_date=$(grep -oP 'Date and Time: \K[^\n]+' "$LOG_FILE" | tail -n 1)
oldest_date=$(grep -oP 'Date and Time: \K[^\n]+' "$LOG_FILE" | head -n 1)
# Check if either date is empty
if [ -z "$newest_date" ] || [ -z "$oldest_date" ]; then
echo "Error: Unable to extract dates from the log file."
exit 1
fi
# Convert dates to timestamps for comparison
newest_timestamp=$(date -d "$newest_date" +"%s")
oldest_timestamp=$(date -d "$oldest_date" +"%s")
# Calculate the difference in seconds
time_difference=$((newest_timestamp - oldest_timestamp))
# If the difference is greater than or equal to 7 days (604800 seconds)
if [ "$time_difference" -ge 604800 ]; then
# Calculate the cutoff date based on the newest date minus 7 days
cutoff_date=$(date -d "@$((newest_timestamp - 604800))" +"%Y-%m-%d %T")
# Extract entries within the specified date range and remove old entries
awk -v cutoff="$cutoff_date" '/^(\* Server Name:|\* Date and Time:)/ {
server_name = $NF
getline datetime
if (datetime >= cutoff) {
print "* Server Name: " server_name
print datetime
for (i = 1; i <= 5; i++) {
getline line
print line
}
}
}
' "$LOG_FILE" > "$TMP_FILE"
# Replace the original log file with the filtered entries
#mv "$TMP_FILE" "$LOG_FILE"
else
echo "No need to remove old entries. Time difference is less than 7 days."
fi
Could anyone help me on this?
If you have tac
and GNU date
, reading the file backwards could be efficient:
tac "$LOG_FILE" |
awk -v RS= '
/Date and Time/ {
if (!cutoff)
"date -d \""$(NF-4)" "$(NF-5)" -7days\" +%F%T" | getline cutoff
else
if ($(NF-5)$(NF-4) < cutoff) exit
}
{ print ORS $0 }
' |
tac