TSS is calculated as (x - mean) **2
and it is easy to calculate if all of the data is readily available to you. But in my case, the data is streaming continuously and I need to calculate moving TSS on this data. For example, let's say that the whole data is x=np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
. But this data is streaming in batches, like this:
batch1: [1, 2, 3]
batch2: [4, 5, 6, 7]
batch3: [8, 9, 10]
How can I calculate the moving TSS in this case? Any logical explanation along with the solution will be highly appreciated
TSS can be split into two terms, each of which can easily be calculated incrementally:
TSS = sum[ ( X - sum[X]/N )2 ]
= sum[ X2 - 2X*sum[X]/N + sum[X]2/N2 ]
= sum[X2] - 2sum[X]2/N + sum[X]2/N
= sum[X2] - sum[X]2/N
You only need to maintain running totals of X and X2, along with a count N of how many samples you've seen so far.