Search code examples
awksed

Replace the values under some columns using awk or sed


I have a data, which contains some strange values under the columns start with "AS_".

It is a csv file, but for understanding I am putting the files with space,

sl no   AS_2f   AD_2f   SR_2f   SV_2f   AS_5f   AD_5f   SR_5f   SV_5f                                                   1.0     3.0     5.0     35.0    9.0     11.0    13.0    15.0    17.0
2.0     3.0     4.0     5.0     6.0     107.0   8.0     204.0   10.0
3.0     3.4     34.0    4.2     4.6     5.0     5.4     5.8     6.2
4.0     3.0     2.0     1.0     0.0     -1.0    -2.0    204.0   -4.0
5.0     24.5    44.0    63.5    83.0    102.5   122.0   141.5   161.0
6.0     32.0    58.0    84.0    110.0   136.0   162.0   188.0   214.0 

cat << EOF > data.csv
sl no,AS_2f,AD_2f,SR_2f,SV_2f,AS_5f,AD_5f,SR_5f,SV_5f
1.0,3.0,5.0,35.0,9.0,11.0,13.0,15.0,17.0
2.0,3.0,4.0,5.0,6.0,107.0,8.0,204.0,10.0
3.0,3.4,34.0,4.2,4.6,5.0,5.4,5.8,6.2
4.0,3.0,2.0,1.0,0.0,-1.0,-2.0,204.0,-4.0
5.0,24.5,44.0,63.5,83.0,102.5,122.0,141.5,161.0
6.0,32.0,58.0,84.0,110.0,136.0,162.0,188.0,214.0
EOF

I would like to replace the values with greater than 20 with "nan" under all "AS_" columns (AS_2f, AS_5f, AS_9f, etc.). There are many more AS_ columns.

I can do it for AS_2f only, I can't able to do it for other AS_ columns. My script is.

awk -F',' 'NR==1 {print} NR>1 {for(i=2; i<=NF; i++) if ($i > 20 && i==2) $i="nan"; print}' data.csv

Desire output:

sl no,AS_2f,AD_2f,SR_2f,SV_2f,AS_5f,AD_5f,SR_5f,SV_5f
1.0,3.0,5.0,35.0,9.0,11.0,13.0,15.0,17.0
2.0,3.0,4.0,5.0,6.0,nan,8.0,204.0,10.0
3.0,3.4,34.0,4.2,4.6,5.0,5.4,5.8,6.2
4.0,3.0,2.0,1.0,0.0,-1.0,-2.0,204.0,-4.0
5.0,nan,44.0,63.5,83.0,nan,122.0,141.5,161.0
6.0,nan,58.0,84.0,110.0,nan,162.0,188.0,214.0

Solution

  • awk '
        BEGIN { FS = OFS = "," }
    
        {
            # perform any substitutions
            # (does nothing when NR==1 because c is empty)
            for (i in c)
                if ($i>20)
                    $i="nan"
        
            # apply OFS (optional if FS==OFS)
            $1=$1
    
            # output
            print
        }
    
        # save column numbers to check
        NR==1 {
            for (i=1; i<=NF; i++)
                if ($i~/^AS_/)
                    c[i]
        }
    ' data.csv