I'm running into an issue where gawk
prints unwanted output. I want to find lines in a file that match an expression, test to see if the information in the line matches a certain condition, and then print the line if it does. I'm getting the output that I want, but gawk
is also printing every line that matches the expression rather than just the lines that meet the condition.
I'm trying to search through files containing dates and times for certain actions to be executed. I want to show only lines that contain times in the future. The dates are formatted like so:
text... 2016-01-22 10:03:41 more text...
I tried using sed
to just print all lines starting with ones that had the current hour, but there is no guarantee that the file contains a line with that hour, (plus there is no guarantee that the lines all have any particular year, month, day etc.) so I needed something more robust. I decided trying to convert the times into seconds since epoch, and comparing that to the current systime
. If the conversion produces a number greater than systime
, I want to print that line.
Right now it seems like gawk
's mktime()
function is the key to this. Unfortunately, it requires input in the following format:
yyyy mm dd hh mm ss
I'm currently searching a test file (called timecomp
) for a regular expression matching the date format.
Edit: the test file only contains a date and time on each line, no other text.
I used sed
to replace the date separators (i.e. /, -, and :) with a space, and then piped the output to a gawk script called stime
using the following statement:
sed -e 's/[-://_]/ /g' timecomp | gawk -f stime
Here is the script
# stime
BEGIN { tsec=systime(); } /.*20[1-9][0-9] [0-1][1-9] [0-3][0-9] [0-2][0-9][0-6][0-9] [0-6][0-9]/ {
if (tsec < mktime($0))
print "\t" $0 # the tab is just to differentiate the desired output from the other lines that are being printed.
} $1
Right now this is getting the basic information that I want, but it is also printing every like that matches the original expression, rather than just the lines containing a time in the future. Sample output:
2016 01 22 13 23 20
2016 01 22 14 56 57
2016 01 22 15 46 46
2016 01 22 16 32 30
2016 01 22 18 56 23
2016 01 22 18 56 23
2016 01 22 22 22 28
2016 01 22 22 22 28
2016 01 22 23 41 06
2016 01 22 23 41 06
2016 01 22 20 32 33
How can I print only the lines in the future?
Note: I'm doing this on a Mac, but I want it to be portable to Linux because I'm ultimately making this for some tasks I have to do at work.
I'd like trying to accomplish this in one script rather than requiring the sed
statement to reformat the dates, but I'm running into other issues that probably require a different question, so I'm sticking to this for now.
Any help would be greatly appreciated! Thanks!
Answered: I had a $1
at the last line of my script, and that was the cause of the additional output.
Instead of awk, this is an (almost) pure Bash solution:
#!/bin/bash
# Regex for time string
re='[0-9]{4}-[0-9]{2}-[0-9]{2} ([0-9]{2}:){2}[0-9]{2}'
# Current time, in seconds since epoch
now=$(date +%s)
while IFS= read -r line; do
# Match time string
[[ $line =~ $re ]]
time_string="${BASH_REMATCH[0]}"
# Convert time string to seconds since epoch
time_secs=$(date -d "$time_string" +%s)
# If time is in the future, print line
if (( time_secs > now )); then
echo "$line"
fi
done < <(grep 'pattern' "$1")
This takes advantage of the Coreutils date
formatting to convert a date to seconds since epoch for easy comparison of two dates:
$ date
Fri, Jan 22, 2016 11:23:59 PM
$ date +%s
1453523046
And the -d
argument to take a string as input:
$ date -d '2016-01-22 10:03:41' +%s
1453475021
The script does the following:
pattern
, but could be anything)pattern
yyyy-mm-dd hh:mm:ss
and extract the match$now
, which is the current date/time in seconds since epochFor an example input file like this one
text 2016-01-22 10:03:41 with time in the past
more text 2016-01-22 10:03:41 matching pattern but in the past
other text 2017-01-22 10:03:41 in the future matching pattern
some text 2017-01-23 10:03:41 in the future but not matching
blahblah 2022-02-22 22:22:22 pattern and also in the future
the result is
$ date
Fri, Jan 22, 2016 11:36:54 PM
$ ./future_time logfile
other text 2017-01-22 10:03:41 in the future matching pattern
blahblah 2022-02-22 22:22:22 pattern and also in the future