I am using gnuplot combined with AWK to plot 2D bar plot from the following input data:
#Acceptor DonorH Donor Frames Frac AvgDist AvgAng
lig_608@O3 HIE_163@HE2 HIE_163@NE2 498 0.5304 2.8317 153.0580
lig_608@O GLU_166@H GLU_166@N 476 0.5069 2.8858 161.7174
lig_608@O1 HIE_41@HE2 HIE_41@NE2 450 0.4792 2.8484 158.5193
THR_26@O lig_608@H9 lig_608@N1 399 0.4249 2.8312 149.9578
lig_608@O2 THR_26@H THR_26@N 312 0.3323 2.9029 164.8033
lig_608@O1 ASN_142@HD21 ASN_142@ND2 14 0.0149 2.8445 158.4224
lig_608@O1 GLN_189@HE22 GLN_189@NE2 2 0.0021 2.8562 149.7421
lig_608@O1 GLN_189@HE21 GLN_189@NE2 1 0.0011 2.7285 158.4377
lig_608@O3 GLY_143@H GLY_143@N 1 0.0011 2.7421 147.8213
My script takes the data from the third and 5th columns considering only the lines where the value from the 5th column > 0.05, producing bar graph
cat <<EOS | gnuplot > graph.png
set term pngcairo size 800,600
set xtics noenhanced
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set key off
set style fill solid 0.5
set boxwidth 0.9
plot "<awk 'NR == 1 || \$5 > 0.05' $file" using 0:5:xtic(3) with boxes
EOS
!EDITED: within my bash workflow the script looks like
for file in "${output}"/${target}*.log ; do
file_name3=$(basename "$file")
file_name2="${file_name3/.log/}"
file_name="${file_name2/${target}_/}"
echo "vizualisation with Gnuplot!"
cat <<EOS | gnuplot > ${output}/${file_name2}.png
set title "$file_name" font "Century,22" textcolor "#b8860b"
set tics font "Helvetica,12"
#set term pngcairo size 1280,960
set term pngcairo size 800,600
set yrange [0:1]
set xtics noenhanced
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set key off
set style fill solid 0.5
set boxwidth 0.9
plot "<awk 'NR == 1 || \$5 > 0.05' $file" using 0:5:xtic(3) with boxes
EOS
done
This is the image produced from following filtered data:
HIE_163@NE2 0.5304
GLU_166@N 0.5069
HIE_41@NE2 0.4792
lig_608@N1 0.4249
THR_26@N 0.3323
I need to modify my awk searching expression integrated in the gnuplot that makes selection of the two columns from the whole data. Instead of taking the index from the third column (Donor) from each line I need to take it either from the first (#Acceptor) or form the third (#Donor) column. The index should be taken from one of these columns depending on the lig_* pattern. E.g. if the data in the (#Acceptor) column starts from lig* I need to take the value from the third column (#Donor) of the same line and visa verse (lig* pattern presents either in the 1st column or in the 3rd but not in the both..) Taking my example, the filtered data with the updated searching should become:
HIE_163@NE2 0.5304 # the first index from the third column
GLU_166@N 0.5069 # the first index from the third column
HIE_41@NE2 0.4792 # the first index from the third column
THR_26@O 0.4249 # !!!! the first index from the first column !!
THR_26@N 0.3323 # the first index from the third column
As you potentially want to do more complicated processing with awk
, I would
suggest an alternative way of mixing awk
and gnuplot
.
Gnuplot supports including inline data in its script files, so you could have awk generate the inline data while supplying the plot-configuration with bash
, all done in a sub-shell. For example:
(
printf '$data << EOD\n'
awk 'NR>1 && $5>0.05 { print $1 ~ /^lig/ ? $3 : $1, $5 }' infile
cat << EOS
EOD
set term pngcairo size 1280,960 font ",20"
set output "output.png"
set xtics noenhanced
set ytics 0.02
set grid y
set key off
set style fill solid 0.5
set boxwidth 0.9
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
plot "\$data" using 0:2:xtic(1) with boxes, "" using 0:2:2 with labels offset 0,1
EOS
)
Would produce this gnuplot script:
$data << EOD
HIE_163@NE2 0.5304
GLU_166@N 0.5069
HIE_41@NE2 0.4792
THR_26@O 0.4249
THR_26@N 0.3323
EOD
set term pngcairo size 1280,960 font ",20"
set output "output.png"
set xtics noenhanced
set ytics 0.02
set grid y
set key off
set style fill solid 0.5
set boxwidth 0.9
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
plot "$data" using 0:2:xtic(1) with boxes, "" using 0:2:2 with labels offset 0,1
Pipe it to Gnuplot, i.e. (...) | gnuplot
and get this in output.png
: