Search code examples
algorithmdata-structuresmoving-averageweighted-average

Data structure/algorithm to efficiently save weighted moving average


I'd like to sum up moving averages for a number of different categories when storing log records. Imagine a service that saves web server logs one entry at a time. Let's further imagine, we don't have access to the logged records. So we see them once but don't have access to them later on.

For different pages, I'd like to know

  • the total number of hits (easy)
  • a "recent" average (like one month or so)
  • a "long term" average (over a year)

Is there any clever algorithm/data model that allows to save such moving averages without having to recalculate them by summing up huge quantities of data?

I don't need an exact average (exactly 30 days or so) but just trend indicators. So some fuzziness is not a problem at all. It should just make sure that newer entries are weighted higher than older ones.

One solution probably would be to auto-create statistics records for each month. However, I don't even need past month statistics, so this seems like overkill. And it wouldn't give me a moving average but rather swap to new values from month to month.


Solution

  • An easy solution would be to keep an exponentially decaying total.

    It can be calculated using the following formula:

    newX = oldX * (p ^ (newT - oldT)) + delta
    

    where oldX is the old value of your total (at time oldT), newX is the new value of your total (at time newT); delta is the contribution of new events to the total (for example the number of hits today); p is less or equal to 1 and is the decay factor. If we take p = 1, then we have the total number of hits. By decreasing p, we effectively decrease the interval our total describes.