I want to statistically analyse outputfiles from a benchmark that runs on 600 nodes. In particular, I need the min, upper quartile, median, lower quartile, min and mean values. My output are the files testrun16-[1-600]
with the code:
ListofFiles = system('dir testrun16-*')
set print 'MaxValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_max
}
set print 'upquValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_up_quartile
}
set print 'MedianValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_median
}
set print 'loquValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_lo_quartile
}
set print 'MinValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_min
}
set print 'MeanValues.dat'
do for [file in ListofFiles]{
stats file using 1 nooutput
print STATS_mean
}
unset print
set term x11
set title 'CLAIX2016 distribution of OSnoise using FWQ'
set xlabel "Number of Nodes"
set ylabel "Runtime [ns]"
plot 'MaxValues.dat' using 1 title 'maximum value', 'upquValues.dat' title 'upper quartile', 'MedianValues.dat' using 1 title 'median value', 'loquValues.dat' title 'lower quartile', 'MinValues.dat' title 'minimum value', 'MeanValues.dat' using 1 title 'mean value';
set term png
set output 'noises.png'
replot
I gain these values and can plot them. However, the tuples from each run get mixed up. The mean of testrun16-17.dat
is plotted on x=317
, it's min is also at another place.
How can I save the output but keep the tuples together and plot each node on it's actual place?
Windows (and Linux?) might have some special way to sort (or unsort) data in a directory list. To eliminate this uncertainty you can loop your files by number. However, this assumes that all numbers from 1 to maximum (=FilesCount
, in your case 600) actually exist.
You tagged Linux, sorry, but I only know Windows and the command to get a list of only the filenames in Windows is 'dir /B testrun16-*'
.
Is there a special reason why you write the statistic numbers in 7 different files? Why not into one file?
Something like this: (modified after OP comment)
### batch statistics
reset session
FileRootName = 'testrun16'
FileList = system('dir /B '.FileRootName.'-*')
FilesCount = words(FileList)
print "Files found: ", FilesCount
# function for extracting the number from the filename
GetFileNumber(s) = int(s[strstrt(s,"-")+1:strstrt(s,".dat")-1])
set print FileRootName.'_Statistics.dat'
print "File Max UpQ Med LoQ Min Mean"
do for [FILE in FileList] {
stats FILE u 1 nooutput
print sprintf("%d %g %g %g %g %g %g", \
GetFileNumber(FILE), \
STATS_max, STATS_up_quartile, STATS_median, \
STATS_lo_quartile, STATS_min, STATS_mean)
}
set print
plot FileRootName.'_Statistics.dat' \
u 1:2 title 'maximum value', \
'' u 1:3 title 'upper quartile', \
'' u 1:4 title 'median value', \
'' u 1:5 title 'lower quartile', \
'' u 1:6 title 'minimum value', \
'' u 1:7 title 'mean value'
### end of code