Search code examples
awkcolumnsorting

Awk: Create columns based on current column value


Need some awk-fu! I have the following data structure:

ybcxl,05,06,07,08,09,10,11  
yxxu,01  
yxxu,03,05,06,07,08,09,10,11  
ybban,01,03,04,05,06,07,08  
zxvhu,01,02,03,04,05,06,07,08,09,10,11,12  

Each line should have 13 columns. Starting at Col 2, I need to asses the value. The value should always be the column number minus 1. In other words, Column 2 should have a value of 1. I need to create columns (by inserting commas) so that each line has the appropriate number of columns with the existing values ordered correctly (col number minus 1 with an offset of the 1st record).

Example of corrected data:

yxxu,01,,,,,,,,,,  
yxxu,,,03,,05,06,07,08,09,10,11,  

Solution

  • $ cat af.txt
    ybcxl,05,06,07,08,09,10,11
    yxxu,01
    yxxu,03,05,06,07,08,09,10,11
    ybban,01,03,04,05,06,07,08
    zxvhu,01,02,03,04,05,06,07,08,09,10,11,12
    
    $ cat af.awk
    BEGIN { FS="," }
    {
        printf "%s", $1
        for (i=2; i<=NF; ++i) a[$i+0] = $i
        for (i=1; i<=12; ++i) printf ",%s", (i in a ? a[i] : "")
        print ""
        delete a
    }
    
    $ awk -f af.awk af.txt
    ybcxl,,,,,05,06,07,08,09,10,11,
    yxxu,01,,,,,,,,,,,
    yxxu,,,03,,05,06,07,08,09,10,11,
    ybban,01,,03,04,05,06,07,08,,,,
    zxvhu,01,02,03,04,05,06,07,08,09,10,11,12
    

    Test to make sure that all the rows have 13 columns:

    $ awk -f af.awk af.txt | awk -F, '{ print NF }'
    13
    13
    13
    13
    13
    

    Test to make sure column numbers are one more than their corresponding field value (when present):

    $ awk -f af.awk af.txt | awk -F, '{ error = 0; for(i=2;i<=13;++i) { if ($i && i != ($i + 1)) error = 1 } print (error ? "ERROR!" : "GOOD!")}'
    GOOD!
    GOOD!
    GOOD!
    GOOD!
    GOOD!
    

    Explanation:

    BEGIN { FS="," }
    Set field separator to comma

    printf "%s", $1
    Output the first field (using printf to avoid a newline)

    for (i=2; i<=NF; ++i) a[$i+0] = $i
    Store all the numbers present in the record into an array, using their numeric equivalents as the indices ($i+0 converts string into number)

    for (i=1; i<=12; ++i) printf ",%s", (i in a ? a[i] : "")
    For each field we want to output, if the number is in the array print it, otherwise, just print the comma

    print ""
    Print a newline to finish the output record

    delete a
    Clear the array before reading the next input record