Search code examples
plotgnuplot

Gnuplot: How to make stats consider records with less data points than expected?


I have a data file whose records are composed by a timestamp, and one or two data points.

When I run stats <datafile> using 2:3, the x values in the records with one data point are ignored.

Example data file:

$ echo '1 10 1
2 20 2
3 50
4 40 4' > test.dat

and Gnuplot invocation:

$ echo 'stats "test.dat" using 2:3' | gnuplot 2>&1 | grep Maximum

  Maximum:           40.0000 [2]        4.0000 [2]

I can run two separate stats:

$ echo 'stats "test.dat" using 2
stats "test.dat" using 3
' | gnuplot 2>&1 | grep Maximum

  Maximum:           50.0000 [1]
  Maximum:            4.0000 [1]

This works, however, is there a more idiomatic way to do it?

(additionally, in some cases, when running the second stats, I need to ignore the ranges, via stats stats [*:*][*:*])


Solution

  • Let me again summarize your issue: If you want to extract the maxima of column 2 and 3 via stats $Data u 2:3, gnuplot will ignore those lines which don't have a 3rd column. Hence, depending on the data you might miss the absolute maximum in column 2.

    If you insist on using only a single stats command, you can do the following:

    • initialize ymax = NaN
    • first check whether you have a 3rd column or not. Check help valid.
    • check if the value in column 3 is not NaN and ymax is still NaN, then initialize ymax=$3 (see: gnuplot: How to compare to NaN?)
    • in a serial evaluation check if the current value of column 3 is larger than current ymax and if it is the case assign ymax=$3. Check help operators binary (serial evaluation) and help ternary.
    • assuming that you always have a 2nd column, the stats u (..., $2) command will effectively run on column 2, hence STATS_max will hold the maximum of the second column.

    Overall, what is easier: running stats twice, i.e. stats $Data u 2 and stats $Data u 3, or the script below?

    Script:

    ### get maxima from partly empty columns with a single stats command
    reset session
    
    $Data <<EOD
    -1  NaN
     0    5   NaN
     1   10   1
     2   20   2
     3   50
     4   40   4
     5   30   3
    EOD
    
    ymax = NaN
    stats $Data u (valid(3) ? ($3==$3 && ymax!=ymax ? ymax=$3 : 0, $3>ymax ? ymax=$3:0) : 0, $2) nooutput
    
    print STATS_max, ymax
    ### end of script
    

    or in a single line:

    ymax = NaN; stats $Data u (valid(3) ? ($3==$3 && ymax!=ymax ? ymax=$3 : 0, $3>ymax ? ymax=$3:0) : 0, $2) nooutput;
    

    Result:

    50.0 4.0