Search code examples
pythonpandastime-seriessamplingdownsampling

Downsample time series whenever abs(difference) since the previous sample exceeds threshold


I have a timeseries of intraday tick-by-tick stock prices that change gradually over time. Whenever there is a small change (e.g. the price increases by $0.01), a new row of data is created. This leads to a very large data series which is slow to plot. I want to downsample so that small changes (e.g. the price goes up/down/up/down/up/down and is unchanged after 50 rows of data) are ignored, which improves plotting speed without sacrificing the qualitative accuracy of the graph. I only want to sample if the price goes up/up/up/up so that I am only displaying obvious changes.

import pandas as pd
import numpy as np
prices = pd.DataFrame(np.random.randint(0,1000, size=(100, 1))/100+1000, columns=list('A'))

I wish to sample whenever the difference with the previous sample exceeds some threshold. So, I will sample row 0 by default. If row 1, 2, 3 and 4 are too close to row 0, I want to throw them away. Then, if row 5 is sufficiently far away from row 0, I will sample that. Then, row 5 becomes my new anchor point, and I will repeat the same process described immediately above.

Is there a way to do this, ideally without a loop?


Solution

  • You could apply a down-sampling masking function that checks if the distance has been exceeded. Then use that to select to select the applicable rows.

    Here is the down-sampling masking function:

    def down_mask(x, max_dist=3):
        global cum_diff
        
        # if NaN return True
        if x!=x:
            return True
        
        cum_diff += x
        if abs(cum_diff) > max_dist:
            cum_diff = 0
            return True
        
        return False
        
    

    Then apply it and use it as a mask to get the entries that you want:

    cum_diff = 0
    
    df[df['prices'].diff().apply(down_mask, max_dist=5)]
    
         prices
    0   1002.07
    1   1007.37
    2   1000.09
    6   1008.08
    10  1001.57
    14  1006.74
    18  1000.42
    19  1006.98
    21  1001.30
    26  1008.89
    28  1003.77
    38  1009.04
    40  1000.52
    44  1007.06
    47  1001.21
    48  1009.38
    49  1001.81
    51  1008.64
    52  1002.72
    55  1008.84
    56  1000.86
    57  1007.17
    67  1001.31
    68  1006.33
    79  1001.14
    98  1009.74
    99  1000.53