Search code examples
c#algorithmrate

Calculate event rate per second


I have a game file with millions of events, file size can be > 10gb

Each line is a game action, like:

player 1, action=kill, timestamp=xxxx(ms granularity)
player 1, action=jump, timestamp=xxxx
player 2, action=fire, timestamp=xxxx

Each action is unique and finite for this data set.

I want to perform analysis on this file, like the total number of events per second, while tracking the individual number of actions in that second.

My plan in semi pseudocode:

lastReadGameEventTime = DateTime.MinValue;

while(line=getNextLine() != null)
{
   parse_values(lastReadGameEventTime, out var timestamp, out var action);
   if(timestamp ==  MinValue)
   {
      lastReadGameEventTime = timestamp;
   }
   else if(timestamp.subtract(lastReadGameEventTime).TotalSeconds > 1)
   {
      notify_points_for_this_second(datapoints);
      datapoints = new T();
   }

   if(!datapoints.TryGetValue(action, out var act))
      act = new Dictionary<string,int>();
      act[action] = 0;
   else
      act[action]++;
}
  lastReadGameEventTime = parse_time(line)

My worry is that this is too naive. I was thinking maybe count the entire minute and get the average per second. But of course I will miss game event spikes. And if I want to calculate a 5 day average, it will further degrade the result set. Any clever ideas?


Solution

  • You're asking several different questions here. All are related. Your requirements aren't real detailed, but I think I can point you in the right direction. I'm going to assume that all you want is number of events per second, for some period in the past. So all we need is some way to hold an integer (count of events) for every second during that period.

    There are 86,400 seconds in a day. Let's say you want 10 days worth of information. You can build a circular buffer of size 864,000 to hold 10 days' worth of counts:

    const int SecondsPerDay = 86400;
    const int TenDays = 10 * SecondsPerDay;
    
    int[] TenDaysEvents = new int[TenDays];
    

    So you always have the last 10 days' of counts.

    Assuming you have an event handler that reads your socket data and passes the information to a function, you can easily keep your data updated:

    DateTime lastEventTime = DateTime.MinValue;
    int lastTimeIndex = 0;
    
    void ProcessReceivedEvent(string event)
    {
        // here, parse the event string to get the DateTime
        DateTime eventTime = GetEventDate(event);
        if (lastEventTime == DateTime.MinValue)
        {
            lastTimeIndex = 0;
        }
        else if (eventTime != lastEventTime)
        {
            // get number of seconds since last event
            var elapsedTime = eventTime - lastEventTime;
            var elapsedSeconds = (int)elapsedTime.TotalSeconds;
    
            // For each of those seconds, set the number of events to 0
            for (int i = 1; i <= elapsedSeconds; ++i)
            {
                lastTimeIndex = (lastTimeIndex + 1) % TenDays; // wrap around if we get past the end
                TenDaysEvents[lastTimeIndex] = 0;
            }
        }
        // Now increment the count for the current time index
        ++TenDaysEvents[lastTimeIndex];
    }
    

    This keeps the last 10 days in memory at all times, and is easy to update. Reporting is a bit more difficult because the start might be in the middle of the array. That is, if the current index is 469301, then the starting time is at 469302. It's a circular buffer. The naive way to report on this would be to copy the circular buffer to another array or list, with the starting point at position 0 in the new collection, and then report on that. Or, you could write a custom enumerator that counts back from the current position and starts there. That wouldn't be especially difficult to create.

    The beauty of the above is that your array remains static. You allocate it once, and just re-use it. You might want to add an extra 60 entries, though, so that there's some "buffer" between the current time and the time from 10 days ago. That will prevent the data for 10 days ago from being changed during a query. Add an extra 300 items, to give yourself a 5-minute buffer.

    Another option is to create a linked list of entries. Again, one per second. With that, you add items to the end of the list, and remove older items from the front. Whenever an event comes in for a new second, add the event entry to the end of the list, and then remove entries that are more than 10 days (or whatever your threshold is) from the front of the list. You can still use LINQ to report on things, as recommended in another answer.

    You could use a hybrid, too. As each second goes by, write a record to the database, and keep the last minute, or hour, or whatever in memory. That way, you have up-to-the-second data available in memory for quick reports and real-time updates, but you can also use the database to report on any period since you first started collecting data.

    Whatever you decide, you probably should keep some kind of database, because you can't guarantee that your system won't go down. In fact, you can pretty much guarantee that your system will go down at some point. It's no fun losing data, or having to scan through terabytes of log data to re-build the data that you've collected over time.