Search code examples
linuxstatisticsgnuplot

Save output from 'stats' command in gnuplot


I want to statistically analyse outputfiles from a benchmark that runs on 600 nodes. In particular, I need the min, upper quartile, median, lower quartile, min and mean values. My output are the files testrun16-[1-600]

with the code:

ListofFiles = system('dir testrun16-*')

set print 'MaxValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_max
}

set print 'upquValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_up_quartile
}

set print 'MedianValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_median
}

set print 'loquValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_lo_quartile
}

set print 'MinValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_min
}

set print 'MeanValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_mean
}

unset print
set term x11
set title 'CLAIX2016 distribution of OSnoise using FWQ'
set xlabel "Number of Nodes"
set ylabel "Runtime [ns]"
plot 'MaxValues.dat' using 1 title 'maximum value', 'upquValues.dat' title 'upper quartile', 'MedianValues.dat' using 1 title 'median value', 'loquValues.dat' title 'lower quartile', 'MinValues.dat' title 'minimum value', 'MeanValues.dat' using 1 title 'mean value';
set term png
set output 'noises.png'
replot

I gain these values and can plot them. However, the tuples from each run get mixed up. The mean of testrun16-17.dat is plotted on x=317, it's min is also at another place.

How can I save the output but keep the tuples together and plot each node on it's actual place?


Solution

  • Windows (and Linux?) might have some special way to sort (or unsort) data in a directory list. To eliminate this uncertainty you can loop your files by number. However, this assumes that all numbers from 1 to maximum (=FilesCount, in your case 600) actually exist. You tagged Linux, sorry, but I only know Windows and the command to get a list of only the filenames in Windows is 'dir /B testrun16-*'.

    Is there a special reason why you write the statistic numbers in 7 different files? Why not into one file?

    Something like this: (modified after OP comment)

    ### batch statistics
    reset session
    
    FileRootName = 'testrun16'
    FileList = system('dir /B '.FileRootName.'-*')
    FilesCount =  words(FileList)
    print "Files found: ", FilesCount
    
    # function for extracting the number from the filename 
    GetFileNumber(s) = int(s[strstrt(s,"-")+1:strstrt(s,".dat")-1])
    
    set print FileRootName.'_Statistics.dat'
        print "File Max UpQ Med LoQ Min Mean"
        do for [FILE in FileList] {
            stats FILE u 1 nooutput
            print sprintf("%d %g %g %g %g %g %g", \
            GetFileNumber(FILE), \
            STATS_max, STATS_up_quartile, STATS_median, \
            STATS_lo_quartile, STATS_min, STATS_mean)
        }
    set print
    
    plot FileRootName.'_Statistics.dat' \
           u 1:2 title 'maximum value', \
        '' u 1:3 title 'upper quartile', \
        '' u 1:4 title 'median value', \
        '' u 1:5 title 'lower quartile', \
        '' u 1:6 title 'minimum value', \
        '' u 1:7 title 'mean value'
    ### end of code