I would like to create an array in gnuplot
with character indices. I have five data files named a.txt
, b.txt
, c.txt
etc. Each text file has two columns of code profiling data (the function name and the CPU time in seconds):
# f tcpu, s
function1 221.284
function2 161.412
function3 167.322
I would like to plot the normalised profiling data for all the files in a single histogram plot. Prior to normalising the data I have to find the maximum time value in each data file. I do this with gnuplot
stats
command in a do for
loop. I would like to save the maxima in an array tmax
. It would be convenient to use the character array indices (tmax['a']
), but I cannot find a relevant example anywhere. Please find my minimal example below:
array tmax[5]
# detect and save maxima values
do for [l in "a b c d e"] {
fname = sprintf('%s.txt', l)
stats fname using 2
tmax[l] = STATS_max
}
# print out maxima values
do for [l in "a b c d e"] {
print l
print tmax[l]
}
This attempt fails with the array index out of range
error, when I try to save the STATS_max
value in tmax[l]
. If possible, could you please suggest, how to use character indices in a gnuplot
array? The full gnuplot
script follows:
#!/usr/bin/gnuplot
set style data histogram
set style fill solid
set style histogram clustered
set xtics rotate by 45 offset 0,0 right
set lmargin 8
set bmargin 11
# border
set style line 11 lc rgb '#808080' lt 1
set border 3 back ls 11
set tics nomirror
# grid
set style line 12 lc rgb '#808080' lt 0 lw 1
set grid back ls 12
set terminal postscript eps size 3.5,2.62 enhanced color \
font 'Helvetica, 10' linewidth 1
set output 'advisor.eps'
set xlabel 'C++ function'
set ylabel 't_{cpu} norm, %'
array tmax[5]
# detect and save maxima values
do for [l in "a b c d e"] {
fname = sprintf('%s.txt', l)
stats fname using 2
tmax[l] = STATS_max
}
# print out maxima values
do for [l in "a b c d e"] {
print l
print tmax[l]
}
plot "a.txt" using ($2/tmax['a']):xtic(1) title 'p=1, t=1', \
"b.txt" using ($2/tmax['b']) title 'p=1, t=8', \
"c.txt" using ($2/tmax['c']) title 'p=8, t=1', \
"d.txt" using ($2/tmax['d']) title 'p=4, t=2', \
"e.txt" using ($2/tmax['e']) title 'p=2, t=4'
Example data file is:
# f tcpu, s
NUTS\\_prop 221.284
Grow\\_tree 161.412
Grow\\_branch 167.322
stan\\_gradient 160.204
log\\_prob\\_grad 160.034
leapfrog\\_integrator 128.392
log\\_prob 116.953
poisson\\_log\\_log 80.262
poisson\\_log\\_lpmf 80.252
add 77.031
This would be my suggestion. Instead of arrays, you can also store your data in strings and address the values via real()
and word()
, (check help real
and help word
). This will also work with older gnuplot 4.x versions.
Actually, I'm not sure whether I fully understood your normalization.
The example below divides the data of each file by the maximum time in that file. That's what I understood from your question.
Another option would be to compare the different settings (p=1, t=1
, p=1, t=8
, etc.) how long they take for a certain function (normalized to 1 for the longest time).
Script:
### loop several files for normalized histogram
reset session
myFiles = "a b c d e"
myFile(n) = sprintf("%s.txt",word(myFiles,n))
# create some random test data
myFunctions = "f1 f2 f3 f4 f5 f6"
do for [i=1:words(myFiles)] {
set print myFile(i)
do for [j=1:words(myFunctions)] {
print sprintf("%s %g", word(myFunctions,j),rand(0)*150+70)
}
set print
}
myTmaxs = ''
do for [i=1:words(myFiles)] {
stats myFile(i) u 2 nooutput
myTmaxs = myTmaxs.sprintf(" %g",STATS_max)
}
myTmax(n) = real(word(myTmaxs,n))
print myTmaxs
myTitles = '"p=1, t=1" "p=1, t=8" "p=8, t=1" "p=4, t=2" "p=2, t=4"'
myTitle(n) = word(myTitles,n)
set style data histogram
set style fill solid 0.6
set style histogram clustered
set grid y
set key out
set yrange [0:]
plot for [i=1:words(myFiles)] myFile(i) u ($2/myTmax(i)):xtic(1) title myTitle(i)
### end of script
Result: