Search code examples
gnuplot

max value for same minute over multiple days from csv with unix timestamps


I have a CSV with a unix timestamp column that was collected over multiple days having a data row for every 5 minutes (output log of my photo voltaik roof power plant).

I'd like to create a plot for 24 hours that shows the maximum value for every single (fifth) minute over all days.

Can this be done with gnuplots own capabilities or do I have to do the processing outside gnuplot via scrips?


You don't show how your exact data structure looks like, - theozh

This files are rather large. I placed an example here:
http://www.filedropper.com/log-pv-20190607-20190811 (300kB)

I'm specially interested in column 4 (DC1 P) and 9 (DC2 P). Column 1 (Zeit) holds the unix timestamp.

The final goal is separate graphs (colors) for DC1 P and DC2 P, but that's a different question... ;o)


Solution

  • Update/Revision: After revisiting this answer, I guess it is time for a clean up and simpler and extended solution. After some iterations and clarifications and after OP provided some data (although, the link is not valid anymore), I came up with some suggestions, which can be improved.

    You can do all in gnuplot, no need for external tools!

    The original request to plot the maximum values from several days is easy if you use the plotting style with boxes. But this is basically only a graphical solution. In that case is was apparently sufficient. However, if you are interested in the maximum values as numbers it is a little bit more effort.

    gnuplot has the option smooth unique and smooth frequency (check help smooth). With this you can easily get the average and sum, respectively, but there is no smooth max or smooth min. As @meuh suggested, you can get maximum or mimimum with arrays, which are available since gnuplot 5.2.0

    Script: (Requires gnuplot>=5.2.0)

    ### plot time data modulo 24h avg/sum/min/max
    reset session
    
    FILE = 'log-pv-20190607-20190811.csv'
    
    set datafile separator comma
    HeaderCount = 7
    myTimeFmt   = "%Y-%m-%d %H:%M:%S"
    
    StartTime = ''
    EndTime   = ''
    # if you don't define start/end time it will be taken automatically
    if (StartTime eq '' || EndTime eq '') {
        stats FILE u 1 skip HeaderCount nooutput
        StartTime = (StartTime eq '' ? STATS_min : strptime(myTimeFmt,StartTime))
        EndTime   = (EndTime   eq '' ? STATS_max : strptime(myTimeFmt,EndTime))
    }
    
    Modulo24Hours(t) = (t>=StartTime && t<=EndTime) ? (int(t)%86400) : NaN
    set key noautotitle
    
    set multiplot layout 3,2
    
        set title "All data" offset 0,-0.5
        set format x "%d.%m." timedate
        set grid x,y
        set yrange [0:]
        myHeight = 1./3*1.1
        set size 1.0,myHeight
        plot FILE u 1:4:(tm_mday($1)) skip HeaderCount w l lc var
        set multiplot next
    
        set title "Data per 24 hours"
        set format x "%H:%M" timedate
        set xtics 3600*6
        set size 0.5,myHeight
        plot FILE u (Modulo24Hours($1)):4:(tm_mday($1)) skip HeaderCount w l lc var
    
        set title "Average"
        set size 0.5,myHeight
        plot FILE u (int(Modulo24Hours($1))):4 skip HeaderCount smooth unique w l lc "web-green"
    
        set title "Sum"
        set size 0.5,myHeight
        plot FILE u (int(Modulo24Hours($1))):4 skip HeaderCount smooth freq w l
    
        set title "Min/Max"
        set size 0.5,myHeight
        N = 24*60/5
        SecPerDay = 3600*24
        array Min[N]
        array Max[N]
        do for [i=1:N] { Min[i]=NaN; Max[i]=0 }  # initialize arrays
        stats FILE u (idx=(int($1)%SecPerDay)/300+1, $4>Max[idx] ? Max[idx]=$4:0, \
              Min[idx]!=Min[idx] ? Min[idx]=$4 : $4<Min[idx] ? Min[idx]=$4:0 ) skip HeaderCount nooutput
    
        plot Min u ($1*300):2 w l lc "web-blue", \
             Max u ($1*300):2 w l lc "red"
    
    unset multiplot
    ### end of script
    

    Result:

    enter image description here