Search code examples
macosshellterminaldate-comparison

How to get all lines including dates before today from a textfile in the macOS terminal?


In a textfile there are lots of dates and I want to grep or find all the dates before today.

Lines are like abc def ghi:2018-06-20 mno pqr and others without a date. The days are chaotic and not in order. I want to receive all lines of the file including a date before today (not ordered, just as they following in the file).

What I know:

  • Today = date +%Y-%m-%d and how to save it in a variable $A
  • Get lines with this date grep $A file.txt

I know how to implement this in a for-loop to get maybe some days of a week. But how can I find all the dates before today? I think I do have to get a comparison like if $A > $B do grep $B file.txt.

Thank you for your help!

[Yes, I searched a lot but I did not find my solution anywhere.]


Solution

  • $ today="$(date "+%s")"
    $ input="/tmp/file.txt"
    $ cat "${input}"
    abc def ghi:2018-06-25 mno pqr
    abc def ghi:2018-06-24 mno pqr
    abc def ghi:2018-06-23 mno pqr
    abc def ghi:2018-06-22 mno pqr
    abc def ghi:2018-06-21 mno pqr
    abc def ghi:2018-06-20 mno pqr
    def ghi:2018-06-20 mno pqr
    abc ghi:2018-06-20mno pqr abc
    abc def ghi:2017-06-20 mno pqr
    abc def2018-06-20 mno pqr
    abc def ghi:2018-06-19 mno pqr
    def ghi:2018-06-21 mno pqr
    abc ghi:2018-07-20 mno pqr
    abc def ghi:2018-06-20 mno pqr
    abc def2018-05-20 mno pqr
    1sss018-05-20 mno pqr
    1sss05-20-2018 mno pqr
    
    $ sed -n 's/.*\([[:digit:]]\{4\}-[[:digit:]]\{2\}-[[:digit:]]\{2\}\).*/\1/p' "${input}" \
    | sort -u \
    | xargs -n1 date -j -f '%Y-%m-%d' '+%s' \
    | xargs -n1 -I% awk 'BEGIN{if(%<'${today}'){print %}}' \
    | xargs -n1 date -j -f '%s' '+%Y-%m-%d' \
    | xargs -n1 -I% grep % $input \
    | sort -u
    abc def ghi:2017-06-20 mno pqr
    abc def ghi:2018-06-19 mno pqr
    abc def ghi:2018-06-20 mno pqr
    abc def ghi:2018-06-21 mno pqr
    abc def ghi:2018-06-22 mno pqr
    abc def2018-05-20 mno pqr
    abc def2018-06-20 mno pqr
    abc ghi:2018-06-20mno pqr abc
    def ghi:2018-06-20 mno pqr
    def ghi:2018-06-21 mno pqr
    

    $today is the current date in seconds since the epoch, $input is the file you want to parse. sed hunts for dates (without verifying they are real dates, for instance 0000-99-99 would match), the first sort eliminates duplicate input dates, the first xargs/date converts all the found dates into seconds since the epoch, xargs/awk outputs all dates to today, the next xargs/dates converts the date back to "%Y-%d-%m", xargs/grep finds all the preceding dates in the input file, and the last sort eliminates any duplicated lines.