Search code examples
plotawkgnuplot

gnuplot/ awk: ploating bar graph for filtered data


I am using gnuplot combined with AWK to plot 2D bar plot from the following input data:

#Acceptor                DonorH           Donor   Frames         Frac      AvgDist       AvgAng
lig_608@O3          HIE_163@HE2     HIE_163@NE2      498       0.5304       2.8317     153.0580
lig_608@O             GLU_166@H       GLU_166@N      476       0.5069       2.8858     161.7174
lig_608@O1           HIE_41@HE2      HIE_41@NE2      450       0.4792       2.8484     158.5193
THR_26@O             lig_608@H9      lig_608@N1      399       0.4249       2.8312     149.9578
lig_608@O2             THR_26@H        THR_26@N      312       0.3323       2.9029     164.8033
lig_608@O1         ASN_142@HD21     ASN_142@ND2       14       0.0149       2.8445     158.4224
lig_608@O1         GLN_189@HE22     GLN_189@NE2        2       0.0021       2.8562     149.7421
lig_608@O1         GLN_189@HE21     GLN_189@NE2        1       0.0011       2.7285     158.4377
lig_608@O3            GLY_143@H       GLY_143@N        1       0.0011       2.7421     147.8213

My script takes the data from the third and 5th columns considering only the lines where the value from the 5th column > 0.05, producing bar graph

cat <<EOS | gnuplot > graph.png
set term pngcairo size 800,600
set xtics noenhanced
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set key off
set style fill solid 0.5
set boxwidth 0.9
plot "<awk 'NR == 1 || \$5 > 0.05' $file" using 0:5:xtic(3) with boxes
EOS

!EDITED: within my bash workflow the script looks like

for file in "${output}"/${target}*.log ; do
 file_name3=$(basename "$file")
 file_name2="${file_name3/.log/}"
 file_name="${file_name2/${target}_/}"
echo "vizualisation with Gnuplot!"
cat <<EOS | gnuplot > ${output}/${file_name2}.png
set title "$file_name" font "Century,22" textcolor "#b8860b"
set tics font "Helvetica,12"
#set term pngcairo size 1280,960
set term pngcairo size 800,600
set yrange [0:1]
set xtics noenhanced
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set key off
set style fill solid 0.5
set boxwidth 0.9
plot "<awk 'NR == 1 || \$5 > 0.05' $file" using 0:5:xtic(3) with boxes
EOS
done

enter image description here

This is the image produced from following filtered data:

HIE_163@NE2 0.5304
GLU_166@N 0.5069
HIE_41@NE2 0.4792
lig_608@N1 0.4249
THR_26@N 0.3323

I need to modify my awk searching expression integrated in the gnuplot that makes selection of the two columns from the whole data. Instead of taking the index from the third column (Donor) from each line I need to take it either from the first (#Acceptor) or form the third (#Donor) column. The index should be taken from one of these columns depending on the lig_* pattern. E.g. if the data in the (#Acceptor) column starts from lig* I need to take the value from the third column (#Donor) of the same line and visa verse (lig* pattern presents either in the 1st column or in the 3rd but not in the both..) Taking my example, the filtered data with the updated searching should become:

HIE_163@NE2 0.5304 # the first index from the third column
GLU_166@N 0.5069 # the first index from the third column
HIE_41@NE2 0.4792 # the first index from the third column
THR_26@O 0.4249 # !!!! the first index from the first column !!
THR_26@N 0.3323 # the first index from the third column

Solution

  • As you potentially want to do more complicated processing with awk, I would suggest an alternative way of mixing awk and gnuplot.

    Gnuplot supports including inline data in its script files, so you could have awk generate the inline data while supplying the plot-configuration with bash, all done in a sub-shell. For example:

    (
    printf '$data << EOD\n'
    
    awk 'NR>1 && $5>0.05 { print $1 ~ /^lig/ ? $3 : $1, $5 }' infile
    
    cat << EOS
    EOD
    
    set term pngcairo size 1280,960 font ",20"
    set output "output.png"
    
    set xtics noenhanced
    set ytics 0.02
    set grid y
    set key off
    
    set style fill solid 0.5
    set boxwidth 0.9
    
    set xlabel "Fraction, %"
    set ylabel "H-bond donor, residue"
    
    plot "\$data" using 0:2:xtic(1) with boxes, "" using 0:2:2 with labels offset 0,1
    EOS
    )
    

    Would produce this gnuplot script:

    $data << EOD
    HIE_163@NE2 0.5304
    GLU_166@N 0.5069
    HIE_41@NE2 0.4792
    THR_26@O 0.4249
    THR_26@N 0.3323
    EOD
    
    set term pngcairo size 1280,960 font ",20"
    set output "output.png"
    
    set xtics noenhanced
    set ytics 0.02
    set grid y
    set key off
    
    set style fill solid 0.5
    set boxwidth 0.9
    
    set xlabel "Fraction, %"
    set ylabel "H-bond donor, residue"
    
    plot "$data" using 0:2:xtic(1) with boxes, "" using 0:2:2 with labels offset 0,1
    

    Pipe it to Gnuplot, i.e. (...) | gnuplot and get this in output.png: