Search code examples
algorithmstatisticsdata-analysistrend

Simple trend analysis algorithm


OK, so you have some historic data in the form of [say] an array of integers. This, for example, could represent free-space on a server HDD over a two-year period, with each array element representing a daily sample.

The data (free-space in this example) has a downward trend, but also has periodic positive spikes where files have been removed/compressed, Etc.

How would you go about identifying the overall trend for the two-year period, i.e.: iron out the peaks and troughs in the data?

Now, I did A-level statistics and then a stats module in my degree, but I've slept over 7,000 times since then, and well, it's leaked out of my brain.

I'm not after a bit of code as such, more of a description of how you'd approach this problem...

Thanks in advance!


Solution

  • If I was doing this to produce a line through points for me to look at, I would probably use a some variant of Loess, described at http://en.wikipedia.org/wiki/Local_regression, http://stat.ethz.ch/R-manual and /R-patched/library/stats/html/loess.html. Basically, you find the smoothed value at any particular point by doing a weighted regression on the data points near that point, with the nearest points given the most weight.