Search code examples
bashshellawkcut

Cut and sort delimited dates from stdout via pipe


I am trying to split some strings from stdout to get the dates from it, but I have two cases

full.20201004T033103Z.vol93.difftar.gz
full.20201007T033103Z.vol94.difftar.gz

Which should produce: 20201007T033103Z which is the nearest date to now (newest)

Or:

inc.20200830T033103Z.to.20200906T033103Z.vol1.difftar.gz
inc.20200929T033103Z.to.20200908T033103Z.vol10.difftar.gz

Should get the second date (after .to.) not the first one, and print only the newest date: 20200908T033103Z

What I tried:

cat dates_file | awk -F '.to.' 'NF > 1 {print $2}' | cut -d\. -f1 | sort -r -t- -k3.1,3.4 -k2,2 | head -1

This only works for the second case and not covering the first, also I am not sure about the date sorting logic.

Here is a sample data

full.20201004T033103Z.vol93.difftar.gz
full.20201004T033103Z.vol94.difftar.gz
full.20201004T033103Z.vol95.difftar.gz
full.20201004T033103Z.vol96.difftar.gz
full.20201004T033103Z.vol97.difftar.gz
full.20201004T033103Z.vol98.difftar.gz
full.20201004T033103Z.vol99.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.manifest
inc.20200830T033103Z.to.20200906T033103Z.vol1.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol10.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol11.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol12.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol13.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol14.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol15.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol16.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol17.difftar.gz

Solution

  • To get most recent data from your sample data you can use this awk:

    awk '{
       sub(/^(.*\.to|[^.]+)\./, "")
       gsub(/\..+$|[TZ]/, "")
    }
    $0 > max {
       max = $0
    }
    END {
       print max
    }' file
    
    20201004033103