Search code examples
gnuplot

Is gnuplot able to summarize x rows as a single data point?


The question is if I can change the resolution of the data in gnuplot easily.

Say I have a N+1 column file consisting of x and y1, y2, ... ,yN values. X datapoints have a resolution of one minute.

What I like to do is setting a parameter (say 5) and now gnuplot calculates a datapoint out of the first 5 rows for all y columns, then the second five rows and so on (means: take first x value and for every y column sum them up for a 5 row interval divided by 5).

Is this possible without changing the datafile itself? Maybe there exists already such a feature? Or do you think it is a more elegant way manipulating the datafile?

Thx for your thoughts.


Solution

  • Yes it is possible. However it is a bit complicated in the general case, so you might prefer to do the averaging in a separate preprocessing step before passing the data to gnuplot.

    First note that you can create a filter to accept and plot every Nth point by using the pseudo-column 0 (line number) and the "not a number" flag NaN. The +1 is because the line numbering starts at zero.

    filter(y) = ((int(column(0)+1)%N == 0) ? y : NaN)
    plot DATA using 1:(filter(column(2)))
    

    Now we make the filter more complicated by adding a serial evaluation operator to each of the two outcome paths. Serial evaluation guarantees that comma-separated expressions are evaluated in left-to-right order with the net value set to the value of the rightmost expression.

    sum_and_filter(y) = (int(column(0)+1)%N == 0) ? (out=(sum+y)/N, sum=0, out) : (sum=sum+y, NaN)
    sum = 0
    plot DATA using 1:(sum_and_filter(column(2)))
    

    Here is a plot showing both the original data and the output from filtering

    N = 5
    sum = 0
    plot DATA using 1:2 with linespoints pt 1, \
         DATA using 1:(sum_and_filter(column(2))) with points pt 7
    

    enter image description here

    In the special case that your x values are equally spaced, a much simpler option is available since gnuplot version 5.2.something. The keyword bins will do the accumulation for you. However left to itself it will plot the sum rather than the mean of the bin contents, so you need prior knowledge of the x spacing to calculate the mean. Here I add the bin option to the previous plot. Note that it does not use the filtering function at all.

    set style fill transparent solid 0.25
    plot DATA using 1:2 with linespoints pt 1, \
         DATA using 1:(sum_and_filter(column(2))) with points pt 7, \
         DATA using 1:($2/5) bins binwidth=50 with boxes title "bin width 50"
    

    enter image description here

    The bins are drawn centered about the corresponding range of x values. You might want to adjust this by including an offset: plot DATA using ($1-offset):($2/5).