Search code examples
javasqlhsqldbusage-statistics

Get Averages Between Timestamps with multiple Statuses


I am trying to gather simple statistics on data, such as time a service is online, time a service is offline, averages, etc. I've found a few solutions already, but they all rely on certain things, such as the rows being back to back (ROW_NUMBER -1) or there only being two states.

My data comes in the form of logs, always after the fact (i.e. no live data). The largest problem for me to figure out is that there are more than two states. Currently, there are a possibility of four different states (Enabled, Disabled, Active, Inactive) and I'd like to be able to collect data on each.

I'm provided the data a single line at a time containing a Service Name, Old Status, New Status, and a Timestamp. Currently the data is stored in a single table. I cannot change how the data is provided, but I can change how it is stored and I'm beginning to think that table is my primary draw back.

Here is an example of how the data may end up in my table currently:

CREATE TABLE IF NOT EXISTS statusupdates (
  sid int UNIQUE, 
  fullname VARCHAR(64), 
  oldstatus VARCHAR(16), 
  newstatus VARCHAR(16), 
  time TIMESTAMP);

INSERT INTO statusupdates VALUES
(null, 'fictHTTP', 'Off', 'On', '2017-01-01 02:20:00'),
(null, 'faked', 'On', 'Inactive', '2017-01-01 02:25:00'),
(null, 'ipsum', 'Inactive', 'On', '2017-01-01 02:30:00'),
(null, 'resultd', 'On', 'Inactive', '2017-01-01 02:35:00'),
(null, 'ipsum', 'On', 'Active', '2017-01-01 02:40:00'),
(null, 'fictHTTP', 'On', 'Active', '2017-01-01 02:45:00'),
(null, 'faked', 'Inactive', 'Off', '2017-01-01 02:50:00'),
(null, 'ipsum', 'Active', 'Off', '2017-01-01 02:55:00'),
(null, 'resultd', 'Inactive', 'Off', '2017-01-01 03:00:00');

I believe one method I've found is to narrow it down to one item, such as resultd. Something like SELECT fullname, newstatus, time FROM statusupdates WHERE fullname='resultd' ORDER BY time DESC;. Then with that data, do another query with the same method, but go one step forward (since it's descending order) and get newstatus from that record. As I type that, it seems sloppy.

Alternatively grab oldstatus and in the second query, use it to find newstatus of the following record. But again, this may be sloppy.

I know there is a way to combine those two theoretical queries as well. So, to summarize, I'm far over my head, forgive me! In the end I'd like to see stats like total time, average time, etc for each status. My biggest hurdle right now is getting query to deliver a result, for example, every timestamp entry for ipsum in such a way that I can get the duration of time from the prior entry, plus repeat this until it's gone through all of the records.

Or, perhaps, I'm entirely over thinking this and am making it too complex by shoving all the data into one table--which I've done twice on this project thus far for unrelated items.

Additional thought: A single instance, I could do SELECT old_status, new_status, time FROM statusupdates WHERE time = '2017-01-01 03:00:00' Then I could use old_status like this, SELECT old_status, new_status, time FROM statusupdates WHERE time < 'timeStamp' AND new_status = 'oldStatus' Then subtract the two timestamps which would give me the data for one example. But, then how to do it for the next step, and the next, until its hit all of them.

Update, Another thought: With a combination of some of your fantastic suggestions, what about reading the logs backward? Nevermind, at that point it wouldn't matter what direction they were read. When it encounters a status, create an incomplete record. It would contain old_status and time_stamp as end_time. Then when it encounters that service again, it checks if new_status = old_status and update the record with time_stamp as start_time.

This seems like it would cause a hell of a lot of overhead though. Every record would have to be checked to see if it exists, if not make one, if yes update one. Or maybe that's not too bad?


Solution

  • Do you have access to window functions in your database? If so, you can get the value of the next row for each record (partitioned by fullname):

      select  fullname,
              newstatus,
              avg( time_diff ) as avg_time
      from    (
                select  fullname,
                        oldstatus,
                        newstatus,
                        /* get the time value of the next row for this fullname record */
                        lead( time ) over( 
                          partition by fullname 
                          order by time 
                          rows between 1 following and 1 following 
                        ) as next_time,
                        time,
                        next_time - time as time_diff
                from    statusupdates
              ) as a
       group by fullname,
              newstatus
    

    EDIT

    In the absence of window functions, you can get the next_time in a slightly more convoluted way:

    select a.*,
           b.next_time
    from   statusupdates as a
           left join
           (
           select a.fullname,
                  a.time,
                  min( b.time ) as next_time
           from   statusupdates as a
                  left join
                  statusupdates as b
                  on a.fullname = b.fullname
                  and a.time < b.time
           group by a.fullname,
                  a.time
           ) as b
           on a.fullname = b.fullname
           and a.time = b.time
    ;