Search code examples
regexawksedcut

Printing field that also contains FS character?


The record to process in AWK has these possible formats:

foobar is fixed length, serialno is variable length, and the field I want to capture may contain zero or more underscores.

foobar_823932230_processname.txt
foobar_82393280_process_name.txt
foobar_8239330_foo_process_name.txt

Desired output

processname
process_name
foo_process_name

If I use FS="[_.]" then I can print $3 which works for first record but not 2nd and 3rd.

How can I capture everything between the serial number and .txt?

I'm editing legacy AWK code that needs changing. Once I correctly capture this field, then the awk process continues to generate additional output.


Solution

  • This cut + cut should also work:

    cut -d_ -f3- file | cut -d. -f1
    
    processname
    process_name
    foo_process_name
    

    An awk solution would be using this regex:

    awk '{gsub(/^([^_]+_){2}|\..*$/, "")} 1' file