Search code examples
bashawksedgawk

gawk - suppress output of matched lines


I'm running into an issue where gawk prints unwanted output. I want to find lines in a file that match an expression, test to see if the information in the line matches a certain condition, and then print the line if it does. I'm getting the output that I want, but gawk is also printing every line that matches the expression rather than just the lines that meet the condition.

I'm trying to search through files containing dates and times for certain actions to be executed. I want to show only lines that contain times in the future. The dates are formatted like so:

text... 2016-01-22 10:03:41 more text...

I tried using sed to just print all lines starting with ones that had the current hour, but there is no guarantee that the file contains a line with that hour, (plus there is no guarantee that the lines all have any particular year, month, day etc.) so I needed something more robust. I decided trying to convert the times into seconds since epoch, and comparing that to the current systime. If the conversion produces a number greater than systime, I want to print that line.

Right now it seems like gawk's mktime() function is the key to this. Unfortunately, it requires input in the following format:

yyyy mm dd hh mm ss

I'm currently searching a test file (called timecomp) for a regular expression matching the date format.

Edit: the test file only contains a date and time on each line, no other text.

I used sed to replace the date separators (i.e. /, -, and :) with a space, and then piped the output to a gawk script called stime using the following statement:

sed -e 's/[-://_]/ /g' timecomp | gawk -f stime

Here is the script

# stime
BEGIN { tsec=systime();  } /.*20[1-9][0-9] [0-1][1-9] [0-3][0-9] [0-2][0-9][0-6][0-9] [0-6][0-9]/ { 
    if (tsec < mktime($0))
        print "\t" $0    # the tab is just to differentiate the desired output from the other lines that are being printed.
} $1

Right now this is getting the basic information that I want, but it is also printing every like that matches the original expression, rather than just the lines containing a time in the future. Sample output:

2016 01 22 13 23 20
2016 01 22 14 56 57
2016 01 22 15 46 46
2016 01 22 16 32 30
    2016 01 22 18 56 23
2016 01 22 18 56 23
    2016 01 22 22 22 28
2016 01 22 22 22 28
    2016 01 22 23 41 06
2016 01 22 23 41 06
    2016 01 22 20 32 33

How can I print only the lines in the future?

Note: I'm doing this on a Mac, but I want it to be portable to Linux because I'm ultimately making this for some tasks I have to do at work.

I'd like trying to accomplish this in one script rather than requiring the sed statement to reformat the dates, but I'm running into other issues that probably require a different question, so I'm sticking to this for now.

Any help would be greatly appreciated! Thanks!


Answered: I had a $1 at the last line of my script, and that was the cause of the additional output.


Solution

  • Instead of awk, this is an (almost) pure Bash solution:

    #!/bin/bash
    
    # Regex for time string
    re='[0-9]{4}-[0-9]{2}-[0-9]{2} ([0-9]{2}:){2}[0-9]{2}'
    
    # Current time, in seconds since epoch
    now=$(date +%s)
    
    while IFS= read -r line; do
    
        # Match time string
        [[ $line =~ $re ]]
        time_string="${BASH_REMATCH[0]}"
    
        # Convert time string to seconds since epoch
        time_secs=$(date -d "$time_string" +%s)
    
        # If time is in the future, print line
        if (( time_secs > now )); then
            echo "$line"
        fi
    
    done < <(grep 'pattern' "$1")
    

    This takes advantage of the Coreutils date formatting to convert a date to seconds since epoch for easy comparison of two dates:

    $ date
    Fri, Jan 22, 2016 11:23:59 PM
    $ date +%s
    1453523046
    

    And the -d argument to take a string as input:

    $ date -d '2016-01-22 10:03:41' +%s
    1453475021
    

    The script does the following:

    • Filter the input file with grep (for lines containing a generic pattern, but could be anything)
    • Loop over lines containing pattern
    • Match the line with a regex that matches the date/time string yyyy-mm-dd hh:mm:ss and extract the match
    • Convert the time string to seconds since epoch
    • Compare that value to the time in $now, which is the current date/time in seconds since epoch
    • If the time from the logfile is in the future, print the line

    For an example input file like this one

    text 2016-01-22 10:03:41 with time in the past
    more text 2016-01-22 10:03:41 matching pattern but in the past
    other text 2017-01-22 10:03:41 in the future matching pattern
    some text 2017-01-23 10:03:41 in the future but not matching
    blahblah 2022-02-22 22:22:22 pattern and also in the future
    

    the result is

    $ date
    Fri, Jan 22, 2016 11:36:54 PM
    $ ./future_time logfile
    other text 2017-01-22 10:03:41 in the future matching pattern
    blahblah 2022-02-22 22:22:22 pattern and also in the future