What statistical methods out there that will estimate the probability density of data as it arrives temporally?
I need to estimate the pdf of a multivariate dataset; however, new data arrives over time and as the data arrives the density estimation must update.
What I have been using so far is kernel estimations by storing a buffer of the data and computing a new kernel density estimation with every update of new data; however, I can no longer keep up with the amount of data needed to be stored. Therefore, I need a method that will keep track of the overall pdf/density estimation rather that the individual datum. Any suggestions would be really helpful. I work in Python, but since this is long-winded any algorithm suggestions would be also helpful.
Scipy's implementation of KDE includes the functionality to increment the KDE by each datum instead of for each point. This is nested inside a "if more points than data" loop, but you could probably re-purpose it for your needs.
if m >= self.n:
# there are more points than data, so loop over data
for i in range(self.n):
diff = self.dataset[:, i, newaxis] - points
tdiff = dot(self.inv_cov, diff)
energy = sum(diff*tdiff,axis=0) / 2.0
result = result + exp(-energy)
In this case, you could store the result of your kde as result
, and each time you get a new point you could just calculate the new Gaussian and add it to your result. Data can be dropped as needed, you are only storing the KDE.