Search code examples
regexawknawk

In awk or nawk, how do I use the last occurrence of pipe character as field delimiter, giving me 2 fields?


I would prefer to not use gawk-only features as I will need to run this on various UNIX flavors and not all of them have gawk. I have a file with lines like this:

^myfile\..*\.(pork|beef)$|send -d j
^myfile\..*\.(chicken|turkey|quail)$|send -d q
^myfile\..*\.cheese$|send -d u

Sometimes, but not always, the first field contains one or more pipe characters. The characters after the last pipe can reliably be called field 2.


Solution

  • I'm not sure that this is entirely portable but I think it is:

    awk '{
        # Find the position of the last "|" in the line.
        p=match($0, /\|[^|]*$/)
    
        # "Split" the line into two fields around that position.
        a[1]=substr($0, 1, p-1)
        a[2]=substr($0, p+1)
    
        printf "[%s] [%s]\n", a[1], a[2]
    }' file.in
    

    As indicated by Ed Morton in the comments the use of p here is unnecessary as the awk match function also sets the RSTART variable to the position in the string where the regex matched so the above could also be written this way:

    awk '{
        # Find the last "|" in the line.
        match($0, /\|[^|]*$/)
    
        # "Split" the line into two fields around that position (using the RSTART variable from the match() call).
        a[1]=substr($0, 1, RSTART-1)
        a[2]=substr($0, RSTART+1)
    
        printf "[%s] [%s]\n", a[1], a[2]
    }' file.in'
    

    In fact doing effectively this exact task is the example of match() in the awk Grymoire.