awk split printing

How to split a field and then to print the last element using awk

I am trying to edit a file which has this format:

field1 field2 field3 gene_id "xxxxx"; transcript_id "XM_xxxxxxxx.x"; db_xref "GeneID:102885392"; exon_number "1";

I would like as output:

field1 field2 field3 exon_number "1";

I am using awk to do it, but I failed to print the last part of the last field after splitting it. Here is my code:

awk '{split($4,a,";"); print ($1, $2,$3, a[$NF])}' input

I know a[$NF] is not working, but how to indicate the last subfield; is it the last element of the array? (In my file exon_number is not always the 5th element, but always the last one).

Solution

exon_number "1" is your 2nd-last ;-separated subfield, not your last one since there's a null string after the last ; you're splitting on.

awk 'BEGIN{FS=OFS="\t"} {n=split($4,a,/[[:space:]]*;[[:space:]]*/); print $1, $2, $3, a[n-1]";"}' input

or:

awk 'BEGIN{FS=OFS="\t"} {n=split($4,a,/[[:space:]]*;[[:space:]]*/); $4=a[n-1]";"; print}' input

See split() at https://www.gnu.org/software/gawk/manual/gawk.html#String-Functions