Search code examples
regexlinuxshelldelimiter

File fields and columns adjustment with awk [LINUX]


I have an issue with columns delimiters adjustment in a file in linux into a database.

I need 14 columns and I use "|" as a delimiter so I applied :

awk -F'|' '{missing=14-NF;if(missing==0){print $0}else{printf "%s",$0;for(i=1;i<=missing-1;i++){printf "|"};print "|"}}' myFile

Suppose I have a row like that:

a|b|c|d|e||f||g||||h|i|

after applying the awk command it will be:

a|b|c|d|e||f||g||||h|i||

and this is not acceptable I need the data to be 14 columns only.

Sample input {In case of 14 fields row]:

a|b|c|d|e||f||g||||h|i

Do nothing

Sample input {In case of extra fields]:

a|b|c|d|e||f||g||||h|i|

ouput:

a|b|c|d|e||f||g||||h|i

Sample Input {In case of less fields}:

a|b|c|d||e||f||g|h

output:

a|b|c|d||e||f||g|h|||

Solution

  • You may use this gnu-awk solution:

    awk -v n=14 '
    BEGIN {FS=OFS="|"}
    {
       $0 = gensub(/^(([^|]*\|){13}[^|]*)\|.*/, "\\1", "1")
       for (i=NF+1; i<=n; ++i)
          $i = ""
    } 1' file
    
    a|b|c|d|e||f||g||||h|i
    a|b|c|d|e||f||g||||h|i
    a|b|c|d||e||f||g|h|||
    

    Where original file is this:

    cat file
    
    a|b|c|d|e||f||g||||h|i
    a|b|c|d|e||f||g||||h|i|
    a|b|c|d||e||f||g|h
    

    Here:

    • Using gnsub we remove all extra fields
    • Using for loop we create new fields to make NF = n

    If you don't have gnu-awk then following should work on non-gnu awk (tested on BSD awk):

    awk -v n=14 '
    BEGIN {FS=OFS="|"}
    {
       for (i=NF+1; i<=n; ++i) $i=""
       for (i=n+1; i<=NF; ++i) $i=""
       NF = n
    } 1' file