Search code examples
pythonpandasresamplingdownsamplingactivity-recognition

Downsample the Time Series data of Accelerometer and Gyroscope


I have time series data for Physical Activities. The data was recorded at 50hz frequency. But now I want to down sample the data at 20hz because I want to train and predict model at 20hz.

Is there an efficient way in python to do that ? I've heard of Panda's resample function but don't exactly know how can I use it efficiently for my problem. Any piece of code will be really helpful.

   epoch (ms)              time (10:00)  elapsed (s)  x-axis (g)  y-axis (g)  z-axis (g)
1613977400899   2021-02-22T12:03:20.899            0      -0.336       0.886       0.649
1613977400920   2021-02-22T12:03:20.920        0.021      -0.233       0.799       0.648
1613977400940   2021-02-22T12:03:20.940        0.041      -0.173       0.771       0.629
1613977400961   2021-02-22T12:03:20.961        0.062      -0.132       0.757       0.596
1613977400981   2021-02-22T12:03:20.981        0.082      -0.113       0.724       0.57
1613977401002   2021-02-22T12:03:21.002        0.103      -0.127       0.713       0.538
1613977401021   2021-02-22T12:03:21.021        0.122      -0.175       0.743       0.488
1613977401041   2021-02-22T12:03:21.041        0.142      -0.266       0.775       0.417
1613977401062   2021-02-22T12:03:21.062        0.163      -0.281       0.774       0.402
1613977401082   2021-02-22T12:03:21.082        0.183      -0.212       0.713       0.427
1613977401103   2021-02-22T12:03:21.103        0.204      -0.17        0.649       0.46
1613977401123   2021-02-22T12:03:21.123        0.224      -0.204       0.649       0.524
1613977401144   2021-02-22T12:03:21.144        0.245      -0.313       0.684       0.658
1613977401164   2021-02-22T12:03:21.164        0.265      -0.415       0.727       0.785
1613977401183   2021-02-22T12:03:21.183        0.284      -0.419       0.726       0.82

Solution

  • A main issue here seems to be that you original frequency is “roughly” 20ms (or 50Hz), not exactly. We’ll need to resample in 2 steps:

    1. Upsample to 1ms, where we can define which interpolation to use
    2. Downsample to 50ms (which is just picking one every 50 rows, so easy)

    First let’s build a time index. Here you have the information twice, so either of these will work:

    >>> df = df.set_index(df['epoch (ms)'].astype('datetime64[ms]'))
    >>> df = df.set_index(pd.to_datetime(df['time (10:00)']))
    >>> df
                                epoch (ms)             time (10:00)  elapsed (s)  x-axis (g)  y-axis (g)  z-axis (g)
    time (10:00)                                                                                                    
    2021-02-22 12:03:20.899  1613977400899  2021-02-22T12:03:20.899        0.000      -0.336       0.886       0.649
    2021-02-22 12:03:20.920  1613977400920  2021-02-22T12:03:20.920        0.021      -0.233       0.799       0.648
    2021-02-22 12:03:20.940  1613977400940  2021-02-22T12:03:20.940        0.041      -0.173       0.771       0.629
    2021-02-22 12:03:20.961  1613977400961  2021-02-22T12:03:20.961        0.062      -0.132       0.757       0.596
    2021-02-22 12:03:20.981  1613977400981  2021-02-22T12:03:20.981        0.082      -0.113       0.724       0.570
    2021-02-22 12:03:21.002  1613977401002  2021-02-22T12:03:21.002        0.103      -0.127       0.713       0.538
    2021-02-22 12:03:21.021  1613977401021  2021-02-22T12:03:21.021        0.122      -0.175       0.743       0.488
    2021-02-22 12:03:21.041  1613977401041  2021-02-22T12:03:21.041        0.142      -0.266       0.775       0.417
    2021-02-22 12:03:21.062  1613977401062  2021-02-22T12:03:21.062        0.163      -0.281       0.774       0.402
    2021-02-22 12:03:21.082  1613977401082  2021-02-22T12:03:21.082        0.183      -0.212       0.713       0.427
    2021-02-22 12:03:21.103  1613977401103  2021-02-22T12:03:21.103        0.204      -0.170       0.649       0.460
    2021-02-22 12:03:21.123  1613977401123  2021-02-22T12:03:21.123        0.224      -0.204       0.649       0.524
    2021-02-22 12:03:21.144  1613977401144  2021-02-22T12:03:21.144        0.245      -0.313       0.684       0.658
    2021-02-22 12:03:21.164  1613977401164  2021-02-22T12:03:21.164        0.265      -0.415       0.727       0.785
    2021-02-22 12:03:21.183  1613977401183  2021-02-22T12:03:21.183        0.284      -0.419       0.726       0.820
    

    (Now we don’t really need the epoch and time columns any more, as the info is in the index)

    Now we can do the resampling:

    >>> df.resample('1ms').interpolate().resample('50ms').last()
                               epoch (ms)             time (10:00)  elapsed (s)  x-axis (g)  y-axis (g)  z-axis (g)
    time (10:00)                                                                                                   
    2021-02-22 12:03:20.850  1.613977e+12  2021-02-22T12:03:20.899        0.000   -0.336000    0.886000    0.649000
    2021-02-22 12:03:20.900  1.613977e+12  2021-02-22T12:03:20.940        0.050   -0.155429    0.765000    0.614857
    2021-02-22 12:03:20.950  1.613977e+12  2021-02-22T12:03:20.981        0.100   -0.125000    0.714571    0.542571
    2021-02-22 12:03:21.000  1.613977e+12  2021-02-22T12:03:21.041        0.150   -0.271714    0.774619    0.411286
    2021-02-22 12:03:21.050  1.613977e+12  2021-02-22T12:03:21.082        0.200   -0.178000    0.661190    0.453714
    2021-02-22 12:03:21.100  1.613977e+12  2021-02-22T12:03:21.144        0.250   -0.338500    0.694750    0.689750
    2021-02-22 12:03:21.150  1.613977e+12  2021-02-22T12:03:21.183        0.284   -0.419000    0.726000    0.820000
    

    Note that you can do different types of interpolations, by specifying the argument you pass to .interpolate(). See the doc on this:

    method : str, default ‘linear’
    Interpolation technique to use. One of:

    • ‘linear’: Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes.
    • ‘time’: Works on daily and higher resolution data to interpolate given length of interval.
    • ‘index’, ‘values’: use the actual numerical values of the index.
    • ‘pad’: Fill in NaNs using existing values.
    • ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).
    • ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes.
    • ‘from_derivatives’: Refers to scipy.interpolate.BPoly.from_derivatives which replaces ‘piecewise_polynomial’ interpolation method in scipy 0.18.

    You can see slight differences in the coordinates, up to you to pick what the right method is for you:

    >>> df.resample('1ms').interpolate('time').resample('50ms').last()
                               epoch (ms)             time (10:00)  elapsed (s)  x-axis (g)  y-axis (g)  z-axis (g)
    time (10:00)                                                                                                   
    2021-02-22 12:03:20.850  1.613977e+12  2021-02-22T12:03:20.899        0.000   -0.336000    0.886000    0.649000
    2021-02-22 12:03:20.900  1.613977e+12  2021-02-22T12:03:20.940        0.050   -0.155429    0.765000    0.614857
    2021-02-22 12:03:20.950  1.613977e+12  2021-02-22T12:03:20.981        0.100   -0.125000    0.714571    0.542571
    2021-02-22 12:03:21.000  1.613977e+12  2021-02-22T12:03:21.041        0.150   -0.271714    0.774619    0.411286
    2021-02-22 12:03:21.050  1.613977e+12  2021-02-22T12:03:21.082        0.200   -0.178000    0.661190    0.453714
    2021-02-22 12:03:21.100  1.613977e+12  2021-02-22T12:03:21.144        0.250   -0.338500    0.694750    0.689750
    2021-02-22 12:03:21.150  1.613977e+12  2021-02-22T12:03:21.183        0.284   -0.419000    0.726000    0.820000
    >>> df.resample('1ms').interpolate('cubic').resample('50ms').last()
                               epoch (ms)             time (10:00)  elapsed (s)  x-axis (g)  y-axis (g)  z-axis (g)
    time (10:00)                                                                                                   
    2021-02-22 12:03:20.850  1.613977e+12  2021-02-22T12:03:20.899        0.000   -0.336000    0.886000    0.649000
    2021-02-22 12:03:20.900  1.613977e+12  2021-02-22T12:03:20.940        0.050   -0.153162    0.766266    0.615219
    2021-02-22 12:03:20.950  1.613977e+12  2021-02-22T12:03:20.981        0.100   -0.122950    0.711454    0.543581
    2021-02-22 12:03:21.000  1.613977e+12  2021-02-22T12:03:21.041        0.150   -0.285487    0.781273    0.403123
    2021-02-22 12:03:21.050  1.613977e+12  2021-02-22T12:03:21.082        0.200   -0.172478    0.656944    0.452494
    2021-02-22 12:03:21.100  1.613977e+12  2021-02-22T12:03:21.144        0.250   -0.342439    0.695493    0.693425
    2021-02-22 12:03:21.150  1.613977e+12  2021-02-22T12:03:21.183        0.284   -0.419000    0.726000    0.820000