Search code examples
dateawk

Awk: In search for better ways of filtering through date columns


I have the following file called dates.csv

1,2020-01-27,mom,walk1
2,2023-07-27,dad,walk2
3,2020-06-04,uncle jeff,walk2
4,2022-04-26,uncle celso,walk3
5,2020-01-06,mom,walk2
6,2023-01-19,val,walk1
7,2019-12-02,otice,walk2
8,2021-01-20,uncle celso,walk1
9,2021-04-16,dad,walk3
10,2023-10-03,dad,walk2
11,2021-04-16,uncle jeff,walk3
12,2019-10-11,uncle jeff,walk3
13,2023-01-04,dad,walk3
14,2019-08-22,uncle celso,walk2
15,2022-04-27,val,walk3
16,2019-04-01,dad,walk1
17,2021-01-27,uncle jeff,walk2
18,2019-12-02,val,walk2
19,2022-04-20,uncle celso,walk2
20,2022-07-04,uncle jeff,walk3

I am trying to filter dates in a more sophisticated way than I am doing using awk. What I am currently doing is the following. If I want select dates larger than or equal to 2020-06-04 I will run

awk 'BEGIN{FS=OFS=","}$2 >= 20200604{print $2=gensub(/(....)-(..)-(..)/,"\\1\\2\\3",1)}' dates.txt | \
sed 's/,\(....\)\(..\)\(..\)/,\1-\2-\3/1'

which gives

2,2023-07-27,dad,walk2
4,2022-04-26,uncle celso,walk3
6,2023-01-19,val,walk1
8,2021-01-20,uncle celso,walk1
9,2021-04-16,dad,walk3
10,2023-10-03,dad,walk2
11,2021-04-16,uncle jeff,walk3
13,2023-01-04,dad,walk3
15,2022-04-27,val,walk3
17,2021-01-27,uncle jeff,walk2
19,2022-04-20,uncle celso,walk2
20,2022-07-04,uncle jeff,walk3

If I want to select dates smaller than or equal to 2020-06-04 I will run

awk 'BEGIN{FS=OFS=","}$2 <= 20200604{print $2=gensub(/(....)-(..)-(..)/,"\\1\\2\\3",1)}' dates.txt | \
sed 's/,\(....\)\(..\)\(..\)/,\1-\2-\3/1'

which results in

1,2020-01-27,mom,walk1
3,2020-06-04,uncle jeff,walk2
5,2020-01-06,mom,walk2
7,2019-12-02,otice,walk2
12,2019-10-11,uncle jeff,walk3
14,2019-08-22,uncle celso,walk2
16,2019-04-01,dad,walk1
18,2019-12-02,val,walk2

I have been trying around with mktime(), strftime() but am not able to get anything to work that is better than what I have got. I appreciate any ideas on this in advance.


Solution

  • You can simplify like this:

    awk -v date=2020-06-04 'BEGIN{FS=OFS=","} $2 >= date' file