Search code examples
pythonarraysnumpymathmoving-average

Calculating TSS on streaming data


TSS is calculated as (x - mean) **2 and it is easy to calculate if all of the data is readily available to you. But in my case, the data is streaming continuously and I need to calculate moving TSS on this data. For example, let's say that the whole data is x=np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]). But this data is streaming in batches, like this:

batch1: [1, 2, 3]
batch2: [4, 5, 6, 7]
batch3: [8, 9, 10]

How can I calculate the moving TSS in this case? Any logical explanation along with the solution will be highly appreciated


Solution

  • TSS can be split into two terms, each of which can easily be calculated incrementally:

    TSS = sum[ ( X - sum[X]/N )2 ]

    = sum[ X2 - 2X*sum[X]/N + sum[X]2/N2 ]

    = sum[X2] - 2sum[X]2/N + sum[X]2/N

    = sum[X2] - sum[X]2/N

    You only need to maintain running totals of X and X2, along with a count N of how many samples you've seen so far.