Search code examples
gnuplot

How to set the range of stats function in gnuplot?


I have a time series of data as shown below and I would like to plot all the data, the mean value for a specific range, e.g. 3, 6, or 9 months.

    Time           T      D      T/D
    8/1/2021    1785.28 23.99   74.42
    7/1/2021    1807.84 25.68   70.40
    6/1/2021    1834.57 27  67.95
    5/1/2021    1850.26 27.5    67.28
    4/1/2021    1760.04 25.69   68.51
    3/1/2021    1718.23 25.65   66.99
    2/1/2021    1808.17 27.29   66.26
    1/1/2021    1866.98 25.88   72.14
    12/1/2020   1858.42 24.97   74.43
    11/1/2020   1866.3  24.08   77.50
    10/1/2020   1900.27 24.23   78.43
    9/1/2020    1921.92 25.74   74.67
    8/1/2020    1968.63 27  72.91

I am using gnuplot 5.2 and I tried to plot using the following code but it seems that the stats did not work as I expected.

  # plot data vs date 
    
    reset session
    
    FILE = "data_01.dat"

    set timefmt "%m/%d/%Y"
    stats ["8/1/2020":"1/1/2021"] FILE u 4 name "A"
    stats ["8/1/2020":"8/1/2021"] FILE u 4 name "B"

    set label 1  sprintf("6 months average= %.2f",A_mean) at graph 0.02, graph 0.95
    set label 2  sprintf("12 months average= %.2f",B_mean) at graph 0.02, graph 0.90

    set xdata time
    set format x "%m/%y"
    set xrange ["8/1/2020":"8/1/2021"]
    
    plot FILE u 1:4 skip 1 w lp lc rgb 'blue' t 'data' ,\
    A_mean lc rgb 'black' t '6 months avg',\
    B_mean lc rgb 'red' t '12 months avg'
    
  # end of code

the output that I get is like this: data_plot

I think I made a mistake in setting the limit of stats which make the stats calculate the mean for the whole data in the column instead of calculates it within a specific range. But I could not find how to fix it. At first I tried using this one

stats ["8/1/2020":"1/1/2021"] FILE u (timecolumn(1)):4 name "A"

but it did not give me any output and ended with: "undefined variable: A_mean". How can I properly set the range of stats function in gnuplot?


Solution

  • Basically, Eldrad already mentioned all the essentials... when I was still coding...

    stats does not work with timedata, i.e. set xdata time. Furthermore, if you want to limit by the first date column you have to use column 1 in stats as well. Check the modified code which will give a reasonable result.

    Edit: instead of using strptime(myTimeFmt,"8/1/2020") many times you can also define a function myTime(s) = strptime(myTimeFmt,s) which shortens everything a bit and doesn't let it look that "scary".

    Code:

    # plot data vs date and using stats 
    reset session
    
    $Data <<EOD
    Time           T      D      T/D
    8/1/2021    1785.28 23.99   74.42
    7/1/2021    1807.84 25.68   70.40
    6/1/2021    1834.57 27  67.95
    5/1/2021    1850.26 27.5    67.28
    4/1/2021    1760.04 25.69   68.51
    3/1/2021    1718.23 25.65   66.99
    2/1/2021    1808.17 27.29   66.26
    1/1/2021    1866.98 25.88   72.14
    12/1/2020   1858.42 24.97   74.43
    11/1/2020   1866.3  24.08   77.50
    10/1/2020   1900.27 24.23   78.43
    9/1/2020    1921.92 25.74   74.67
    8/1/2020    1968.63 27  72.91
    EOD
    
    myTimeFmt = "%m/%d/%Y"
    set timefmt myTimeFmt
    myTime(s) = strptime(myTimeFmt,s)
    
    stats [myTime("8/1/2020"):myTime("1/1/2021")] $Data u (timecolumn(1)):4 name "A" nooutput
    stats [myTime("8/1/2020"):myTime("8/1/2021")] $Data u (timecolumn(1)):4 name "B" nooutput
    
    set label 1  sprintf("6 months average= %.2f",A_mean_y) at graph 0.02, graph 0.95
    set label 2  sprintf("12 months average= %.2f",B_mean_y) at graph 0.02, graph 0.90
    
    set format x "%m/%y" time
    set xrange [myTime("8/1/2020"):myTime("8/1/2021")]
    
    plot $Data u (timecolumn(1)):4 skip 1 w lp lc rgb 'blue' t 'data' ,\
         A_mean_y lc rgb 'black' t '6 months avg',\
         B_mean_y lc rgb 'red'   t '12 months avg'
    ### end of code
    

    Result:

    enter image description here