I have a data, which contains some strange values under the columns start with "AS_".
It is a csv file, but for understanding I am putting the files with space,
sl no AS_2f AD_2f SR_2f SV_2f AS_5f AD_5f SR_5f SV_5f 1.0 3.0 5.0 35.0 9.0 11.0 13.0 15.0 17.0
2.0 3.0 4.0 5.0 6.0 107.0 8.0 204.0 10.0
3.0 3.4 34.0 4.2 4.6 5.0 5.4 5.8 6.2
4.0 3.0 2.0 1.0 0.0 -1.0 -2.0 204.0 -4.0
5.0 24.5 44.0 63.5 83.0 102.5 122.0 141.5 161.0
6.0 32.0 58.0 84.0 110.0 136.0 162.0 188.0 214.0
cat << EOF > data.csv
sl no,AS_2f,AD_2f,SR_2f,SV_2f,AS_5f,AD_5f,SR_5f,SV_5f
1.0,3.0,5.0,35.0,9.0,11.0,13.0,15.0,17.0
2.0,3.0,4.0,5.0,6.0,107.0,8.0,204.0,10.0
3.0,3.4,34.0,4.2,4.6,5.0,5.4,5.8,6.2
4.0,3.0,2.0,1.0,0.0,-1.0,-2.0,204.0,-4.0
5.0,24.5,44.0,63.5,83.0,102.5,122.0,141.5,161.0
6.0,32.0,58.0,84.0,110.0,136.0,162.0,188.0,214.0
EOF
I would like to replace the values with greater than 20 with "nan" under all "AS_" columns (AS_2f, AS_5f, AS_9f, etc.). There are many more AS_ columns.
I can do it for AS_2f only, I can't able to do it for other AS_ columns. My script is.
awk -F',' 'NR==1 {print} NR>1 {for(i=2; i<=NF; i++) if ($i > 20 && i==2) $i="nan"; print}' data.csv
Desire output:
sl no,AS_2f,AD_2f,SR_2f,SV_2f,AS_5f,AD_5f,SR_5f,SV_5f
1.0,3.0,5.0,35.0,9.0,11.0,13.0,15.0,17.0
2.0,3.0,4.0,5.0,6.0,nan,8.0,204.0,10.0
3.0,3.4,34.0,4.2,4.6,5.0,5.4,5.8,6.2
4.0,3.0,2.0,1.0,0.0,-1.0,-2.0,204.0,-4.0
5.0,nan,44.0,63.5,83.0,nan,122.0,141.5,161.0
6.0,nan,58.0,84.0,110.0,nan,162.0,188.0,214.0
awk '
BEGIN { FS = OFS = "," }
{
# perform any substitutions
# (does nothing when NR==1 because c is empty)
for (i in c)
if ($i>20)
$i="nan"
# apply OFS (optional if FS==OFS)
$1=$1
# output
print
}
# save column numbers to check
NR==1 {
for (i=1; i<=NF; i++)
if ($i~/^AS_/)
c[i]
}
' data.csv