Search code examples
arrayscharactergnuplotindices

Gnuplot array with character indices


I would like to create an array in gnuplot with character indices. I have five data files named a.txt, b.txt, c.txt etc. Each text file has two columns of code profiling data (the function name and the CPU time in seconds):

# f         tcpu, s
function1   221.284
function2   161.412
function3   167.322

I would like to plot the normalised profiling data for all the files in a single histogram plot. Prior to normalising the data I have to find the maximum time value in each data file. I do this with gnuplot stats command in a do for loop. I would like to save the maxima in an array tmax. It would be convenient to use the character array indices (tmax['a']), but I cannot find a relevant example anywhere. Please find my minimal example below:

array tmax[5]

# detect and save maxima values
do for [l in "a b c d e"] {
    fname = sprintf('%s.txt', l)
    stats fname using 2
    tmax[l] = STATS_max
}

# print out maxima values
do for [l in "a b c d e"] {
    print l
    print tmax[l]
}

This attempt fails with the array index out of range error, when I try to save the STATS_max value in tmax[l]. If possible, could you please suggest, how to use character indices in a gnuplot array? The full gnuplot script follows:

#!/usr/bin/gnuplot

set style data histogram
set style fill solid
set style histogram clustered
set xtics rotate by 45 offset 0,0 right
set lmargin  8
set bmargin 11

# border
set style line 11 lc rgb '#808080' lt 1
set border 3 back ls 11
set tics nomirror

# grid
set style line 12 lc rgb '#808080' lt 0 lw 1
set grid back ls 12

set terminal postscript eps size 3.5,2.62 enhanced color \
    font 'Helvetica, 10' linewidth 1

set output 'advisor.eps'
set xlabel 'C++ function'
set ylabel 't_{cpu} norm, %'

array tmax[5]

# detect and save maxima values
do for [l in "a b c d e"] {
    fname = sprintf('%s.txt', l)
    stats fname using 2
    tmax[l] = STATS_max
}

# print out maxima values
do for [l in "a b c d e"] {
    print l
    print tmax[l]
}

plot "a.txt" using ($2/tmax['a']):xtic(1) title 'p=1, t=1', \
     "b.txt" using ($2/tmax['b'])         title 'p=1, t=8', \
     "c.txt" using ($2/tmax['c'])         title 'p=8, t=1', \
     "d.txt" using ($2/tmax['d'])         title 'p=4, t=2', \
     "e.txt" using ($2/tmax['e'])         title 'p=2, t=4'

Example data file is:

# f                     tcpu, s
NUTS\\_prop             221.284
Grow\\_tree             161.412
Grow\\_branch           167.322
stan\\_gradient         160.204
log\\_prob\\_grad       160.034
leapfrog\\_integrator   128.392
log\\_prob              116.953
poisson\\_log\\_log     80.262
poisson\\_log\\_lpmf    80.252
add                     77.031

Solution

  • This would be my suggestion. Instead of arrays, you can also store your data in strings and address the values via real() and word(), (check help real and help word). This will also work with older gnuplot 4.x versions.

    Actually, I'm not sure whether I fully understood your normalization. The example below divides the data of each file by the maximum time in that file. That's what I understood from your question. Another option would be to compare the different settings (p=1, t=1, p=1, t=8, etc.) how long they take for a certain function (normalized to 1 for the longest time).

    Script:

    ### loop several files for normalized histogram
    reset session
    
    myFiles   = "a b c d e"
    myFile(n) = sprintf("%s.txt",word(myFiles,n))
    
    # create some random test data
    myFunctions = "f1 f2 f3 f4 f5 f6"
    do for [i=1:words(myFiles)] {
        set print myFile(i)
            do for [j=1:words(myFunctions)] {
                print sprintf("%s %g", word(myFunctions,j),rand(0)*150+70)
            }
        set print
    }
    
    myTmaxs = ''
    do for [i=1:words(myFiles)] {
        stats myFile(i) u 2 nooutput
        myTmaxs = myTmaxs.sprintf(" %g",STATS_max)
    }
    myTmax(n) = real(word(myTmaxs,n))
    print myTmaxs
    
    myTitles   = '"p=1, t=1" "p=1, t=8" "p=8, t=1" "p=4, t=2" "p=2, t=4"'
    myTitle(n) = word(myTitles,n)
    
    set style data histogram
    set style fill solid 0.6
    set style histogram clustered
    set grid y
    set key out
    set yrange [0:]
    
    plot for [i=1:words(myFiles)] myFile(i) u ($2/myTmax(i)):xtic(1) title myTitle(i)
    ### end of script
    

    Result:

    enter image description here