I'd like to sum up moving averages for a number of different categories when storing log records. Imagine a service that saves web server logs one entry at a time. Let's further imagine, we don't have access to the logged records. So we see them once but don't have access to them later on.
For different pages, I'd like to know
Is there any clever algorithm/data model that allows to save such moving averages without having to recalculate them by summing up huge quantities of data?
I don't need an exact average (exactly 30 days or so) but just trend indicators. So some fuzziness is not a problem at all. It should just make sure that newer entries are weighted higher than older ones.
One solution probably would be to auto-create statistics records for each month. However, I don't even need past month statistics, so this seems like overkill. And it wouldn't give me a moving average but rather swap to new values from month to month.
An easy solution would be to keep an exponentially decaying total.
It can be calculated using the following formula:
newX = oldX * (p ^ (newT - oldT)) + delta
where oldX
is the old value of your total (at time oldT
), newX
is the new value of your total (at time newT
); delta
is the contribution of new events to the total (for example the number of hits today); p
is less or equal to 1 and is the decay factor. If we take p = 1
, then we have the total number of hits. By decreasing p
, we effectively decrease the interval our total describes.