Using the following multi-column data as an the input
| Run on Thu Oct 20 14:59:37 2022
|| GB non-polar solvation energies calculated with gbsa=2
idecomp = 1: Per-residue decomp adding 1-4 interactions to Internal.
Energy Decomposition Analysis (All units kcal/mol): Generalized Born solvent
DELTAS:
Total Energy Decomposition:
Residue | Location | Internal | van der Waals | Electrostatic | Polar Solvation | Non-Polar Solv. | TOTAL
-------------------------------------------------------------------------------------------------------------------------------------------------------
SER 1 | R SER 1 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | -0.092 +/- 0.012 | 0.092 +/- 0.012 | 0.000 +/- 0.000 | 0.000 +/- 0.001
GLY 2 | R GLY 2 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | 0.001 +/- 0.001 | -0.001 +/- 0.001 | 0.000 +/- 0.000 | 0.000 +/- 0.001
PHE 3 | R PHE 3 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | -0.003 +/- 0.001 | 0.004 +/- 0.001 | 0.000 +/- 0.000 | 0.000 +/- 0.001
ARG 4 | R ARG 4 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | -0.160 +/- 0.025 | 0.164 +/- 0.025 | 0.000 +/- 0.000 | 0.003 +/- 0.001
LYS 5 | R LYS 5 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | -0.211 +/- 0.038 | 0.230 +/- 0.038 | 0.000 +/- 0.000 | 0.019 +/- 0.004
MET 6 | R MET 6 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | -0.006 +/- 0.003 | 0.010 +/- 0.003 | 0.000 +/- 0.000 | 0.004 +/- 0.001
ALA 7 | R ALA 7 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | -0.019 +/- 0.003 | 0.023 +/- 0.003 | 0.000 +/- 0.000 | 0.003 +/- 0.001
PHE 8 | R PHE 8 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | 0.020 +/- 0.003 | -0.018 +/- 0.003 | 0.000 +/- 0.000 | 0.001 +/- 0.001
PRO 9 | R PRO 9 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | 0.002 +/- 0.002 | 0.002 +/- 0.003 | 0.000 +/- 0.000 | 0.004 +/- 0.001
SER 10 | R SER 10 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | 0.003 +/- 0.004 | -0.009 +/- 0.004 | 0.000 +/- 0.000 | -0.007 +/- 0.002
I call awk to extractthe informations from the first and 8th columns to reduce the input to the 2D data and slightly modify its format
SER_1 0.000
GLY_2 0.000
PHE_3 0.000
ARG_4 0.003
LYS_5 0.019
MET_6 0.004
ALA_7 0.003
PHE_8 0.001
PRO_9 0.004
SER_10 -0.007
then am trying to plot 2D bar chat using gnuplot combined with AWK integrated into bash script:
echo "vizualisation with Gnuplot + AWK (ver 2): plot data from stdin!"
{
echo '$data << EOD'
# reduse input format to 2D columns and rename the IDs
awk '
NF==8 { gsub(/^[[:space:]]+|[[:space:]]+$/,"",$1) # strip leading/trailing spaces from 1st field
gsub(/[[:space:]]+/,"_",$1) # convert all contiguous spaces to a single '_'
gsub(/^[[:space:]]+|[[:space:]]+$/,"",$8) # strip leading/trailing spaces from 8th field
split($8,a,"[[:space:]]") # split 8th field on white space
if (a[1]+0 == a[1] && a[1] > -10 ) # if 1st sub-field is numeric and > 0.005 then ...
print $1,a[1] # print to stdout
}
' $file |
if [ "$SORT_BARS" = 1 ]; then sort -k1,1; else cat
fi |
if [ "$COLOR_DATA" = 1 ]; then
awk -v colors="$color1 $color2 $color3 $color4 $color5 $color6" '
BEGIN { nc = split(colors,clrArr) }
{ print $0, clrArr[NR % nc + 1] }
'
else cat; fi
echo 'EOD'
cat << EOF
set term pngcairo size 800,600
set title "$file_name" noenhanced font "Century,22" textcolor "#b8860b"
set xtics noenhanced font "Helvetica,10"
set xlabel "Residue, #"
set ylabel "dG, kKal/mol"
set yrange [0:-8]
set ytics 0.1
set grid y
set key off
set boxwidth 0.9
set style fill solid 0.5
plot \$data using 0:2:3:xtic(1) with boxes lc rgb var, \
'' using 0:2:2 with labels offset 0,1
EOF
} | gnuplot > ${output}/${file_name2}.png
which produces the following error
gnuplot> plot $data using 0:2:3:xtic(1) with boxes lc rgb var
^
line 1: x range is invalid
since before I used this script to plot the same graphs based on the same input data with positive values, how could I adapt it to new format?
The resulted graph should be something like this (produced via xm-grace without bar coloring):
Just as an example: Although, gnuplot wants to be a plotting program, however, it also can do some data processing without the help of external tools. Extracting the values from your input data using gnuplot looks (at least to me) easier than your awk script. Well, when it comes to sorting, gnuplot doesn't look too good, then, depending on the sort you might be back to external tools.
If your table has a strictly regular structure you could do the following:
Version: set column separator to "|"
and further separate the column via word
(check help datafile separator
and help word
)
Version: if you keep the default column separator (which is whitespace), your string and numerical data to extract is in columns 1, 2, and 28 (check help strcol
and help column
)
Do not to forget to skip 9 header lines (check help skip
).
Data: SO74141830.dat
| Run on Thu Oct 20 14:59:37 2022
|| GB non-polar solvation energies calculated with gbsa=2
idecomp = 1: Per-residue decomp adding 1-4 interactions to Internal.
Energy Decomposition Analysis (All units kcal/mol): Generalized Born solvent
DELTAS:
Total Energy Decomposition:
Residue | Location | Internal | van der Waals | Electrostatic | Polar Solvation | Non-Polar Solv. | TOTAL
-------------------------------------------------------------------------------------------------------------------------------------------------------
SER 1 | R SER 1 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | -0.092 +/- 0.012 | 0.092 +/- 0.012 | 0.000 +/- 0.000 | 0.000 +/- 0.001
GLY 2 | R GLY 2 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | 0.001 +/- 0.001 | -0.001 +/- 0.001 | 0.000 +/- 0.000 | 0.000 +/- 0.001
PHE 3 | R PHE 3 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | -0.003 +/- 0.001 | 0.004 +/- 0.001 | 0.000 +/- 0.000 | 0.000 +/- 0.001
ARG 4 | R ARG 4 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | -0.160 +/- 0.025 | 0.164 +/- 0.025 | 0.000 +/- 0.000 | 0.003 +/- 0.001
LYS 5 | R LYS 5 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | -0.211 +/- 0.038 | 0.230 +/- 0.038 | 0.000 +/- 0.000 | 0.019 +/- 0.004
MET 6 | R MET 6 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | -0.006 +/- 0.003 | 0.010 +/- 0.003 | 0.000 +/- 0.000 | 0.004 +/- 0.001
ALA 7 | R ALA 7 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | -0.019 +/- 0.003 | 0.023 +/- 0.003 | 0.000 +/- 0.000 | 0.003 +/- 0.001
PHE 8 | R PHE 8 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | 0.020 +/- 0.003 | -0.018 +/- 0.003 | 0.000 +/- 0.000 | 0.001 +/- 0.001
PRO 9 | R PRO 9 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | 0.002 +/- 0.002 | 0.002 +/- 0.003 | 0.000 +/- 0.000 | 0.004 +/- 0.001
SER 10 | R SER 10 | 0.000 +/- 0.000 | -0.000 +/- 0.000 | 0.003 +/- 0.004 | -0.009 +/- 0.004 | 0.000 +/- 0.000 | -0.007 +/- 0.002
Script:
### extract data from file
reset session
FILE = "SO74141830.dat"
set datafile separator "|"
set table $Data
plot FILE u (word(strcol(1),1).'_'.word(strcol(1),2)):(word(strcol(8),1)) skip 9 w table
unset table
set datafile separator whitespace # reset to default
print $Data
set table $Data
plot FILE u (strcol(1).'_'.strcol(2)):(column(28)) skip 9 w table
unset table
print $Data
### end of script
Result:
SER_1 0.000
GLY_2 0.000
PHE_3 0.000
ARG_4 0.003
LYS_5 0.019
MET_6 0.004
ALA_7 0.003
PHE_8 0.001
PRO_9 0.004
SER_10 -0.007
SER_1 0
GLY_2 0
PHE_3 0
ARG_4 0.003
LYS_5 0.019
MET_6 0.004
ALA_7 0.003
PHE_8 0.001
PRO_9 0.004
SER_10 -0.007