Search code examples
regexbashshellunixawk

How to use awk command to print multiple lines based on the matched and filtering condition


I am trying to filter out the logs which are 7 days older than the newest log, for example, if the last section of log is 2024-02-13 then all the logs on 2024-02-05 would be removed.

The example log file:

* Server Name: myserver
* Date and Time: 2024-02-05 23:00:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

* Server Name: myserver
* Date and Time: 2024-02-05 23:30:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

* Server Name: myserver
* Date and Time: 2024-02-06 00:00:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

.
.
.

* Server Name: myserver
* Date and Time: 2024-02-13 23:00:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

* Server Name: myserver
* Date and Time: 2024-02-13 23:30:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

Expected output:

* Server Name: myserver
* Date and Time: 2024-02-06 00:00:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

.
.
.

* Server Name: myserver
* Date and Time: 2024-02-13 23:00:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

* Server Name: myserver
* Date and Time: 2024-02-13 23:30:01
* Total Number of Child (PID): 4
* PID: 117703, Worker Threads: 46
* PID: 117704, Worker Threads: 30
* PID: 117705, Worker Threads: 30
* PID: 117809, Worker Threads: 30

I did try to use awk command, but it didn't work as what I want, and it just clear the whole log file.

My code is as below:

#!/bin/bash

LOG_FILE="logs/worker_count.log"
TMP_FILE="$LOG_FILE.tmp"

# Extract the newest and oldest dates from the log file
newest_date=$(grep -oP 'Date and Time: \K[^\n]+' "$LOG_FILE" | tail -n 1)
oldest_date=$(grep -oP 'Date and Time: \K[^\n]+' "$LOG_FILE" | head -n 1)

# Check if either date is empty
if [ -z "$newest_date" ] || [ -z "$oldest_date" ]; then
    echo "Error: Unable to extract dates from the log file."
    exit 1
fi

# Convert dates to timestamps for comparison
newest_timestamp=$(date -d "$newest_date" +"%s")
oldest_timestamp=$(date -d "$oldest_date" +"%s")

# Calculate the difference in seconds
time_difference=$((newest_timestamp - oldest_timestamp))

# If the difference is greater than or equal to 7 days (604800 seconds)
if [ "$time_difference" -ge 604800 ]; then
        # Calculate the cutoff date based on the newest date minus 7 days
        cutoff_date=$(date -d "@$((newest_timestamp - 604800))" +"%Y-%m-%d %T")

        # Extract entries within the specified date range and remove old entries
        awk -v cutoff="$cutoff_date" '/^(\* Server Name:|\* Date and Time:)/ {
        server_name = $NF
        getline datetime
        if (datetime >= cutoff) {
          print "* Server Name: " server_name
          print datetime
          for (i = 1; i <= 5; i++) {
            getline line
            print line
          }
        }
      }
    ' "$LOG_FILE" > "$TMP_FILE"

        # Replace the original log file with the filtered entries
        #mv "$TMP_FILE" "$LOG_FILE"
else
        echo "No need to remove old entries. Time difference is less than 7 days."
fi

Could anyone help me on this?


Solution

  • If you have tac and GNU date, reading the file backwards could be efficient:

    tac "$LOG_FILE" |
    awk -v RS= '
        /Date and Time/ {
            if (!cutoff)
                "date -d \""$(NF-4)" "$(NF-5)" -7days\" +%F%T" | getline cutoff
            else
                if ($(NF-5)$(NF-4) < cutoff) exit
        }
        { print ORS $0 }   
    ' |
    tac