i'm running the following code to manipulate numeric data in a vcf table.
cat inputfile | while read row; do
echo $row > tmp
originalProb= `awk '{print $1}' tmp`
probabilityHom1=`awk '{print $2}' tmp`
probabilityHom2=`awk '{print $4}' tmp`
numCols=`awk '{print NF}' tmp`
if [ $numCols -gt 4 ]; then
echo "${originalProb}" >> currentRowGenotypes
elif [ "$probabilityHom1" -gt "$probabilityHom2" ]; then
echo "1/1" >> currentRowGenotypes
elif [ "$probabilityHom1" -lt "$probabilityHom2" ]; then
echo "0/0" >> currentRowGenotypes
elif [ "$probabilityHom1" -eq "$probabilityHom2" ] && [ "$probabilityHom1" -eq 0 ]; then
echo "${originalProb}" >> currentRowGenotypes
else
echo "het" >> currentRowGenotypes
fi
done
cat tmpHeaders currentRowGenotypes > currentFullCol
the input file looks like this
1/1 255 231 0
0/1 255 0 152
0/1 255 0 82
0/1 255 0 151
0/1 239 0 31
0/1 255 0 255
for some reason the awk command doesn't recognize the first column. any suggestions ?
It is not a good idea to create a temporary file just to make awk
split the line into columns because:
awk
.bash
and awk
may be the cause of mistake.You can do it without using awk
. Please try the following:
while read -ra row; do
originalProb="${row[0]}"
probabilityHom1="${row[1]}"
probabilityHom2="${row[3]}"
numCols="${#row}"
if (( numCols > 4 )); then
echo "$originalProb" >> currentRowGenotypes
elif (( probabilityHom1 > probabilityHom2 )); then
echo "1/1" >> currentRowGenotypes
elif (( probabilityHom1 < probabilityHom2 )); then
echo "0/0" >> currentRowGenotypes
elif (( probabilityHom1 == probabilityHom2 && probabilityHom1 == 0 )); then
echo "$originalProb" >> currentRowGenotypes
else
echo "het" >> currentRowGenotypes
fi
done < inputfile
cat tmpHeaders currentRowGenotypes > currentFullCol
As others repeatedly suggest, a better way will be to write with awk
:
awk '{
originalProb = $1
probabilityHom1 = $2
probabilityHom2 = $4
numCols = NF
if ( numCols > 4 )
print originalProb >> "currentRowGenotypes"
else if ( probabilityHom1 > probabilityHom2 )
print "1/1" >> "currentRowGenotypes"
else if ( probabilityHom1 < probabilityHom2 )
print "0/0" >> "currentRowGenotypes"
else if ( probabilityHom1 == probabilityHom2 && probabilityHom1 == 0 )
print originalProb >> "currentRowGenotypes"
else
print "het" >> "currentRowGenotypes"
}' inputfile
cat tmpHeaders currentRowGenotypes > currentFullCol
Hope this helps.