Search code examples
gnuplot

Plot HTTP Status Codes Grouped by Days


I have a stream of timestamped HTTP status codes:

2021-02-09T10:54:00 200 50
2021-02-09T10:57:00 200 35
2021-02-09T11:00:00 200 50
2021-02-09T11:03:00 500 150
2021-02-09T11:06:00 500 350
2021-02-09T11:09:00 500 450
2021-02-09T11:12:00 500 1000
2021-02-09T11:15:00 404 35
2021-02-09T11:18:00 404 50
2021-02-09T11:21:00 200 50
2021-02-09T11:24:00 200 35
2021-02-09T11:27:00 200 50
2021-02-09T11:30:00 200 50

I already managed to setup gnuplot to group the days:

set xdata time
set ydata time
set format y "%H:%M"
set timefmt "%Y-%m-%dT%H:%M:%S"
set xrange ["2021-02-08T00:00:00":"2021-02-14T23:59:59"]

plot 'availability.csv' using (timecolumn(1,"%Y-%m-%d")):(timecolumn(1,"%H-%M")):2…

I already found a lot of samples like summing over the day (boxes/ histogram) or marking the point in time per day (point). But none of them match my goal of availability over time.

My goal is to have a bar per day binned to 15min blocks. Each block should be colored according to the max status code, e.g. HTTP.500=red, HTTP.404=yellow, HTTP.200=green (only these 3, no teapot/redirect/spooky ones, and the colors as a sort of traffic light). Y-axis is the hour of the day, x-axis is the day.

  1. Am I on the right track, is this possible at all with gnuplot?
  2. What does the using clause look like?
  3. How is binning to 15min intervals merged into the second column?
  4. How to color the specific codes? (It is not like a heatmap calculating color from frequency)

Solution

  • Interesting challenge. My suggestion would be the following. It's probably not the easiest, but I would say the result looks reasonable. It uses the plotting style with boxxyerror (see help boxxyerror).

    From your question, I get that you want to have a binning of 15 minutes and display only the color of the maximum status in that interval. Why not showing a histogram of the different states for each interval? For example: if in the interval there are the following HTTP states: 2x 200, 1x 404 and 2x 500. Then the horizontal bar in this interval will be split into 40% green, 20% yellow and 40% red.

    What the following code basically does:

    1. creating some random test data (just for illustration)
    2. binning of the data using smooth freq (check help smooth) with adding a little offset of 1,2,3 seconds for the 3 different states.
    3. do some table rearrangements
    4. create the final table with the x,y positions of the boxes and corresponding to the relative contribution of each status within the binning interval.

    In order to get a better understanding:

    Example data of datablock $Data:

    2021-02-10T12:30:00   200   407
    2021-02-10T12:33:00   200   922
    2021-02-10T12:36:00   404   615
    2021-02-10T12:39:00   200   689
    2021-02-10T12:42:00   200   628
    2021-02-10T12:45:00   500   10
    2021-02-10T12:48:00   200   185
    2021-02-10T12:51:00   200   2
    2021-02-10T12:54:00   404   743
    2021-02-10T12:57:00   200   618
    

    Example data of datablock $Histo3:

    1612960200  5  i
    1612960201  4  i
    1612960202  1  i
    1612961100  5  i
    1612961101  3  i
    1612961102  1  i
    1612961103  1  i
    

    Example data of datablock $Histo4:

            NaN     0   nan   12:30   0     
     2021-02-10     0   0.8   12:30   1     
     2021-02-10   0.8     1   12:30   2     
            NaN     0   nan   12:45   0     
     2021-02-10     0   0.6   12:45   1     
     2021-02-10   0.6   0.8   12:45   2     
     2021-02-10   0.8     1   12:45   3   
     
    

    The code can certainly be optimized. So, look at it as a starting point...

    Code:

    ### status overview as date/time dependent histograms
    reset session
    
    # general settings
    myDateFmt     = "%Y-%m-%d"                    # date only format
    myTimeFmt     = "%H:%M:%S"                    # time only format
    myDateTimeFmt = myDateFmt."T".myTimeFmt       # datetime format
    SecPerDay     = 24*3600                       # seconds per day
    myStatusList  = "200 404 500"                 # possible states
    myColorList   = "0x00ff00 0xffff00 0xff0000"  # green, yellow, red
    
    # create some random test data
    set print $Data
        myTime = time(0)                                 # now
        myRandomStatus(x) = x<0.70 ? 1 : x<0.95 ? 2 : 3  # random status
        myInterval = 3                                   # interval in minutes
        do for [i=1:5000] {
            myTime = myTime + myInterval*60
            myStatus = word(myStatusList,myRandomStatus(rand(0)))  # random status
            myValue = int(rand(0)*1000)                       # random value 0-999
            print sprintf("%s   %s   %g",strftime("%Y-%m-%dT%H:%M:00",myTime),myStatus,myValue)
        }
    set print
    
    # functions
    myStatusNo(col) = column(col)==200 ? 1 : column(col)==404 ? 2 : 3
    myColor(i)      = int(i) ? int(word(myColorList,int(i))) : 1
    myDayTime(t)    = tm_hour(t)*3600 + tm_min(t)*60 + tm_sec(t)
    
    # binning 
    BinWidthSec   = 900        # in seconds 900 sec = 15 min
    BinTime(col)  = floor(myDayTime(timecolumn(col,myDateTimeFmt))/BinWidthSec)*BinWidthSec
    
    set table $Histo1
        set format x "%.0f"
        plot $Data u (timecolumn(1,myDateFmt)+BinTime(1)):(1) smooth freq
        plot $Data u (timecolumn(1,myDateFmt)+BinTime(1)+myStatusNo(2)):(1) smooth freq
    set table $Histo2
        plot $Histo1 u (sprintf("%.0f",$1)):2 w table   # remove empty lines etc.
    set table $Histo3
        set format x "%.0f"
        plot $Histo2 u 1:2 smooth freq                  # sort the events by time
    unset table
    
    # create final table
    myX(col1,col2) = int(column(col1))%4==0 ? (Sum=0.0, Total=column(col2),"NaN") : \
                     strftime(myDateFmt,column(col1))
    myXRelStart(col1,col2) = Sum/Total
    myXRelEnd(col1,col2) = int(column(col1))%4==0 ? NaN : (Sum=Sum+column(col2), Sum/Total)
    BinTimeT(col) = strftime("%H:%M",column(col))
    
    set table $Histo4
        plot $Histo3 u (sprintf("% 10s % 5g % 5g % 7s % 3d", \
             myX(1,2), myXRelStart(1,2), myXRelEnd(1,2), BinTimeT(1), tm_sec($1))) w table
    unset table
    
    # plot settings
    set format x "%d.%m." timedate
    set format y "%H:%M" timedate
    set style fill transparent solid 0.5 noborder
    set yrange [0:SecPerDay]
    set tics out
    set key out title "HTTP status"
    
    plot $Histo4 u (timecolumn(1,myDateFmt)+($3+$2)/2*SecPerDay) : \
                   (timecolumn(4,myTimeFmt)+BinWidthSec/2) : \
                   (($3-$2)/2*SecPerDay) : (BinWidthSec/2.):(myColor($5)) \
                   w boxxy lc rgb var notitle, \
         for [i=1:3] keyentry w boxes lc rgb myColor(i) title word(myStatusList,i)
    
    ### end of code
    

    Result:

    enter image description here