Search code examples
bar-chartgnuplot

How to extract values from a formatted table with gnuplot?


Using the following multi-column data as an the input

| Run on Thu Oct 20 14:59:37 2022
|| GB non-polar solvation energies calculated with gbsa=2
idecomp = 1: Per-residue decomp adding 1-4 interactions to Internal.
Energy Decomposition Analysis (All units kcal/mol): Generalized Born solvent

DELTAS:
Total Energy Decomposition:
Residue |  Location |       Internal      |    van der Waals    |    Electrostatic    |   Polar Solvation   |    Non-Polar Solv.  |       TOTAL
-------------------------------------------------------------------------------------------------------------------------------------------------------
SER   1 | R SER   1 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |   -0.092 +/-  0.012 |    0.092 +/-  0.012 |    0.000 +/-  0.000 |    0.000 +/-  0.001
GLY   2 | R GLY   2 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |    0.001 +/-  0.001 |   -0.001 +/-  0.001 |    0.000 +/-  0.000 |    0.000 +/-  0.001
PHE   3 | R PHE   3 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |   -0.003 +/-  0.001 |    0.004 +/-  0.001 |    0.000 +/-  0.000 |    0.000 +/-  0.001
ARG   4 | R ARG   4 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |   -0.160 +/-  0.025 |    0.164 +/-  0.025 |    0.000 +/-  0.000 |    0.003 +/-  0.001
LYS   5 | R LYS   5 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |   -0.211 +/-  0.038 |    0.230 +/-  0.038 |    0.000 +/-  0.000 |    0.019 +/-  0.004
MET   6 | R MET   6 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |   -0.006 +/-  0.003 |    0.010 +/-  0.003 |    0.000 +/-  0.000 |    0.004 +/-  0.001
ALA   7 | R ALA   7 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |   -0.019 +/-  0.003 |    0.023 +/-  0.003 |    0.000 +/-  0.000 |    0.003 +/-  0.001
PHE   8 | R PHE   8 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |    0.020 +/-  0.003 |   -0.018 +/-  0.003 |    0.000 +/-  0.000 |    0.001 +/-  0.001
PRO   9 | R PRO   9 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |    0.002 +/-  0.002 |    0.002 +/-  0.003 |    0.000 +/-  0.000 |    0.004 +/-  0.001
SER  10 | R SER  10 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |    0.003 +/-  0.004 |   -0.009 +/-  0.004 |    0.000 +/-  0.000 |   -0.007 +/-  0.002

I call awk to extractthe informations from the first and 8th columns to reduce the input to the 2D data and slightly modify its format

SER_1 0.000
GLY_2 0.000
PHE_3 0.000
ARG_4 0.003
LYS_5 0.019
MET_6 0.004
ALA_7 0.003
PHE_8 0.001
PRO_9 0.004
SER_10 -0.007

then am trying to plot 2D bar chat using gnuplot combined with AWK integrated into bash script:

echo "vizualisation with Gnuplot + AWK (ver 2): plot data from stdin!"
{
echo '$data << EOD'
# reduse input format to 2D columns and rename the IDs 
awk '
NF==8 { gsub(/^[[:space:]]+|[[:space:]]+$/,"",$1)        # strip leading/trailing spaces from 1st field
        gsub(/[[:space:]]+/,"_",$1)                      # convert all contiguous spaces to a single '_'

        gsub(/^[[:space:]]+|[[:space:]]+$/,"",$8)        # strip leading/trailing spaces from 8th field
        split($8,a,"[[:space:]]")                        # split 8th field on white space

        if (a[1]+0 == a[1] && a[1] > -10 )              # if 1st sub-field is numeric and > 0.005 then ... 
           print $1,a[1]                                 # print to stdout
      }
' $file |
if [ "$SORT_BARS" = 1 ]; then sort -k1,1; else cat 
fi |
if [ "$COLOR_DATA" = 1 ]; then
awk -v colors="$color1 $color2 $color3 $color4 $color5 $color6" '
    BEGIN { nc = split(colors,clrArr) }
    { print $0, clrArr[NR % nc + 1] }
'
else cat; fi
echo 'EOD'

cat << EOF
set term pngcairo size 800,600
set title "$file_name" noenhanced font "Century,22" textcolor "#b8860b"
set xtics noenhanced font "Helvetica,10"
set xlabel "Residue, #"
set ylabel "dG, kKal/mol"
set yrange [0:-8]
set ytics 0.1
set grid y
set key off
set boxwidth 0.9
set style fill solid 0.5
plot \$data using 0:2:3:xtic(1) with boxes lc rgb var, \
        '' using 0:2:2 with labels offset 0,1
EOF
} | gnuplot > ${output}/${file_name2}.png

which produces the following error

gnuplot> plot $data using 0:2:3:xtic(1) with boxes lc rgb var
                                                             ^
         line 1: x range is invalid

since before I used this script to plot the same graphs based on the same input data with positive values, how could I adapt it to new format?

The resulted graph should be something like this (produced via xm-grace without bar coloring): enter image description here


Solution

  • Just as an example: Although, gnuplot wants to be a plotting program, however, it also can do some data processing without the help of external tools. Extracting the values from your input data using gnuplot looks (at least to me) easier than your awk script. Well, when it comes to sorting, gnuplot doesn't look too good, then, depending on the sort you might be back to external tools.

    If your table has a strictly regular structure you could do the following:

    1. Version: set column separator to "|" and further separate the column via word (check help datafile separator and help word)

    2. Version: if you keep the default column separator (which is whitespace), your string and numerical data to extract is in columns 1, 2, and 28 (check help strcol and help column)

    Do not to forget to skip 9 header lines (check help skip).

    Data: SO74141830.dat

    | Run on Thu Oct 20 14:59:37 2022
    || GB non-polar solvation energies calculated with gbsa=2
    idecomp = 1: Per-residue decomp adding 1-4 interactions to Internal.
    Energy Decomposition Analysis (All units kcal/mol): Generalized Born solvent
    
    DELTAS:
    Total Energy Decomposition:
    Residue |  Location |       Internal      |    van der Waals    |    Electrostatic    |   Polar Solvation   |    Non-Polar Solv.  |       TOTAL
    -------------------------------------------------------------------------------------------------------------------------------------------------------
    SER   1 | R SER   1 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |   -0.092 +/-  0.012 |    0.092 +/-  0.012 |    0.000 +/-  0.000 |    0.000 +/-  0.001
    GLY   2 | R GLY   2 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |    0.001 +/-  0.001 |   -0.001 +/-  0.001 |    0.000 +/-  0.000 |    0.000 +/-  0.001
    PHE   3 | R PHE   3 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |   -0.003 +/-  0.001 |    0.004 +/-  0.001 |    0.000 +/-  0.000 |    0.000 +/-  0.001
    ARG   4 | R ARG   4 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |   -0.160 +/-  0.025 |    0.164 +/-  0.025 |    0.000 +/-  0.000 |    0.003 +/-  0.001
    LYS   5 | R LYS   5 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |   -0.211 +/-  0.038 |    0.230 +/-  0.038 |    0.000 +/-  0.000 |    0.019 +/-  0.004
    MET   6 | R MET   6 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |   -0.006 +/-  0.003 |    0.010 +/-  0.003 |    0.000 +/-  0.000 |    0.004 +/-  0.001
    ALA   7 | R ALA   7 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |   -0.019 +/-  0.003 |    0.023 +/-  0.003 |    0.000 +/-  0.000 |    0.003 +/-  0.001
    PHE   8 | R PHE   8 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |    0.020 +/-  0.003 |   -0.018 +/-  0.003 |    0.000 +/-  0.000 |    0.001 +/-  0.001
    PRO   9 | R PRO   9 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |    0.002 +/-  0.002 |    0.002 +/-  0.003 |    0.000 +/-  0.000 |    0.004 +/-  0.001
    SER  10 | R SER  10 |    0.000 +/-  0.000 |   -0.000 +/-  0.000 |    0.003 +/-  0.004 |   -0.009 +/-  0.004 |    0.000 +/-  0.000 |   -0.007 +/-  0.002
    

    Script:

    ### extract data from file
    reset session
    
    FILE = "SO74141830.dat"
    
    set datafile separator "|"
    set table $Data
        plot FILE u (word(strcol(1),1).'_'.word(strcol(1),2)):(word(strcol(8),1)) skip 9 w table
    unset table
    set datafile separator whitespace   # reset to default
    print $Data
    
    set table $Data
        plot FILE u (strcol(1).'_'.strcol(2)):(column(28)) skip 9 w table
    unset table
    print $Data
    
    ### end of script
    

    Result:

    SER_1   0.000
    GLY_2   0.000
    PHE_3   0.000
    ARG_4   0.003
    LYS_5   0.019
    MET_6   0.004
    ALA_7   0.003
    PHE_8   0.001
    PRO_9   0.004
    SER_10  -0.007
    
    SER_1    0
    GLY_2    0
    PHE_3    0
    ARG_4    0.003
    LYS_5    0.019
    MET_6    0.004
    ALA_7    0.003
    PHE_8    0.001
    PRO_9    0.004
    SER_10   -0.007