gawk - suppress output of matched lines

I'm running into an issue where gawk prints unwanted output. I want to find lines in a file that match an expression, test to see if the information in the line matches a certain condition, and then print the line if it does. I'm getting the output that I want, but gawk is also printing every line that matches the expression rather than just the lines that meet the condition.

I'm trying to search through files containing dates and times for certain actions to be executed. I want to show only lines that contain times in the future. The dates are formatted like so:

text... 2016-01-22 10:03:41 more text...

I tried using sed to just print all lines starting with ones that had the current hour, but there is no guarantee that the file contains a line with that hour, (plus there is no guarantee that the lines all have any particular year, month, day etc.) so I needed something more robust. I decided trying to convert the times into seconds since epoch, and comparing that to the current systime. If the conversion produces a number greater than systime, I want to print that line.

Right now it seems like gawk's mktime() function is the key to this. Unfortunately, it requires input in the following format:

yyyy mm dd hh mm ss

I'm currently searching a test file (called timecomp) for a regular expression matching the date format.

Edit: the test file only contains a date and time on each line, no other text.

I used sed to replace the date separators (i.e. /, -, and :) with a space, and then piped the output to a gawk script called stime using the following statement:

sed -e 's/[-://_]/ /g' timecomp | gawk -f stime

Here is the script

# stime
BEGIN { tsec=systime();  } /.*20[1-9][0-9] [0-1][1-9] [0-3][0-9] [0-2][0-9][0-6][0-9] [0-6][0-9]/ { 
    if (tsec < mktime($0))
        print "\t" $0    # the tab is just to differentiate the desired output from the other lines that are being printed.
} $1

Right now this is getting the basic information that I want, but it is also printing every like that matches the original expression, rather than just the lines containing a time in the future. Sample output:

2016 01 22 13 23 20
2016 01 22 14 56 57
2016 01 22 15 46 46
2016 01 22 16 32 30
    2016 01 22 18 56 23
2016 01 22 18 56 23
    2016 01 22 22 22 28
2016 01 22 22 22 28
    2016 01 22 23 41 06
2016 01 22 23 41 06
    2016 01 22 20 32 33

How can I print only the lines in the future?

Note: I'm doing this on a Mac, but I want it to be portable to Linux because I'm ultimately making this for some tasks I have to do at work.

I'd like trying to accomplish this in one script rather than requiring the sed statement to reformat the dates, but I'm running into other issues that probably require a different question, so I'm sticking to this for now.

Any help would be greatly appreciated! Thanks!

Answered: I had a $1 at the last line of my script, and that was the cause of the additional output.

Solution

Instead of awk, this is an (almost) pure Bash solution:

#!/bin/bash

# Regex for time string
re='[0-9]{4}-[0-9]{2}-[0-9]{2} ([0-9]{2}:){2}[0-9]{2}'

# Current time, in seconds since epoch
now=$(date +%s)

while IFS= read -r line; do

    # Match time string
    [[ $line =~ $re ]]
    time_string="${BASH_REMATCH[0]}"

    # Convert time string to seconds since epoch
    time_secs=$(date -d "$time_string" +%s)

    # If time is in the future, print line
    if (( time_secs > now )); then
        echo "$line"
    fi

done < <(grep 'pattern' "$1")

This takes advantage of the Coreutils date formatting to convert a date to seconds since epoch for easy comparison of two dates:

$ date
Fri, Jan 22, 2016 11:23:59 PM
$ date +%s
1453523046

And the -d argument to take a string as input:

$ date -d '2016-01-22 10:03:41' +%s
1453475021

The script does the following:

Filter the input file with grep (for lines containing a generic pattern, but could be anything)
Loop over lines containing pattern
Match the line with a regex that matches the date/time string yyyy-mm-dd hh:mm:ss and extract the match
Convert the time string to seconds since epoch
Compare that value to the time in $now, which is the current date/time in seconds since epoch
If the time from the logfile is in the future, print the line

For an example input file like this one

text 2016-01-22 10:03:41 with time in the past
more text 2016-01-22 10:03:41 matching pattern but in the past
other text 2017-01-22 10:03:41 in the future matching pattern
some text 2017-01-23 10:03:41 in the future but not matching
blahblah 2022-02-22 22:22:22 pattern and also in the future

the result is

$ date
Fri, Jan 22, 2016 11:36:54 PM
$ ./future_time logfile
other text 2017-01-22 10:03:41 in the future matching pattern
blahblah 2022-02-22 22:22:22 pattern and also in the future