Search code examples
linuxawksed

awk/sed to get a particular field extracted


I have below file data which i want to parse and filter to get the Volumes names from 4th field nd thereby I want to extract the last string starting with ^fsvol-.

Data File:

C0BF4EE2-1381
Completed       pod_a0_4    Success volume/fs-04b/fsvol-07     FSx     May 13, 2024, 19:15:08 (UTC+05:30)      May 14, 2024, 03:15:08 (UTC+05:30)
C55114AF-E488
Completed       pod_a0_3    Success volume/fs-04b/fsvol-02     FSx     May 13, 2024, 19:14:56 (UTC+05:30)      May 14, 2024, 03:14:56 (UTC+05:30)
D7AE9EF8-C573
Completed       pod_a0_2    Success volume/fs-04b/fsvol-08     FSx     May 13, 2024, 19:14:44 (UTC+05:30)      May 14, 2024, 03:14:44 (UTC+05:30)
0C662A8C-CD25
Completed       pod_a0_1    Success volume/fs-04b/fsvol-0a     FSx     May 13, 2024, 19:14:33 (UTC+05:30)      May 14, 2024, 03:14:33 (UTC+05:30)

Test Trial & results:

(devcli) $ sed -n  '/^Completed/s/.*\(fsvol-[a-zA-Z0-9]*\).*/\1/p' test-fil1001
fsvol-07
fsvol-02
fsvol-08
fsvol-0a
(devcli) $ awk  '/^Completed/{match($4, /fsvol-[a-zA-Z0-9]+/); print substr($4, RSTART, RLENGTH)}' test-fil1001
fsvol-07
fsvol-02
fsvol-08
fsvol-0a
(devcli) $ awk  '/^Completed/{print $4}' test-fil1001 | cut -d'/' -f3
fsvol-07
fsvol-02
fsvol-08
fsvol-0a

I have tried above and that is working but looking for a best to way to do it with little shorter version or elegant way.


Solution

  • You can get this done, using a single awk command:

    awk '/^Completed/ && split($4, a, /\//) >= 3 {print a[3]}' file
    
    fsvol-07
    fsvol-02
    fsvol-08
    fsvol-0a
    
    • Using /^Completed/ we are making sue lines start with Completed.
    • split($4, a, /\//) >= 3 splits 4th column by / character and makes sure that we have at least 3 elements in array a after split.
    • print a[3] prints 3rd element of the split array