Search code examples
arraysbashshellawkmultiple-columns

match column and delete the Duplicates in Shell


Input file

Failed,2021-12-14 05:47 EST,On-Demand Backup,abc,/clients/FORD_1130PM_EST_Windows2008,Windows File System
Completed,2021-12-14 05:47 EST,On-Demand Backup,def,/clients/FORD_1130PM_EST_Windows2008,Windows File System
Failed,2021-12-13 19:33 EST,Scheduled Backup,def,/clients/FORD_730PM_EST_Windows2008,Windows File System  
Failed,2021-12-14 00:09 EST,Scheduled Backup,abc,/clients/FORD_1130PM_EST_Windows2008,Windows File System
Failed,2021-12-14 00:09 EST,Scheduled Backup,ghi,/clients/FORD_1130PM_EST_Windows2008,Windows File System

Expected Output

Failed,2021-12-14 00:09 EST,Scheduled Backup,ghi,/clients/FORD_1130PM_EST_Windows2008,Windows File System

I want only those clients which are never successful and there is no on-demand backup run for them.

Code I tried

awk -F ',' '
   $1~/Failed/  { fail[$4]=$0 }
  $1~/Completed/ {delete fail[$4]}
 $3 ~ /Demand/ {delete fail[$4]}
END {for (i in fail) print fail[i]}     
 ' test

Solution

  • You can use this awk command:

    awk -F, 'NR==FNR {if ($1~/Failed/) fail[$4] = $0; next}
    $1 ~ /Completed/ || $3 ~ /Demand/ {delete fail[$4]}
    END {for (i in fail) print fail[i]}' file file
    
    Failed,2021-12-14 00:09 EST,Scheduled Backup,ghi,/clients/FORD_1130PM_EST_Windows2008,Windows File System