Search code examples
pythonarraysnumpytime-seriessktime

Reshape of dataset (Time Series) after filtering?


i am using HampelFilter to detect outliers by SKTIME on my dataset but i faced a problem after applied the filter . My dataset contains Timeseries (signals) the size of my dataset array is (1, 167) while each one of the 167 elements contain 9000 samples.

original data imported from .mat file to python

i wrote this code to apply the HampelFilter on each elements on one raw as following:-

def Hampel(H):
    return HampelFilter(window_length=10).fit_transform(H)
My_Data= data[0,:]

filterd_My_Data=[]
for H in My_Data:
    a=Hampel(H)
    filterd_My_Data.append(a)

Before filtering: the shape of it is (1, 167) and its type is 'numpy.ndarray'. same like in next code below :-

[array([[0.31494141],
        [0.30151367],
        [0.30395508],
        ...,
        [0.3125    ],
        [0.31738281],
        [0.3112793 ]]) array([[0.30151367],
                              [0.30883789],
                              [0.29907227],
                              ...,
                              [0.31738281],
                              [0.36132812],
                              [0.31738281]]) array([[0.29541016],
                                                    [0.29663086],
                                                    [0.29296875],
                                                    ...,
                                                    [0.28686523],
                                                    [0.29907227],
                                                    [0.29663086]])
 array([[0.31616211],
        [0.3112793 ],
        [0.30273438],
        ...,
        [0.31494141],
        [0.32958984],
        [0.3137207 ]]) array([[0.28442383],
                              [0.28930664],
                              [0.28442383],
                              ...,
                              [0.30029297],
                              [0.30151367],
                              [0.31494141]]) array([[0.30761719],
                                                    [0.31005859],
                                                    [0.30639648],
                                                    ...,
                                                    [0.32836914],
                                                    [0.30761719],
                                                    [0.30273438]])

After filtering: it became a 'list' (its length is 167), while the input was 'numpy.ndarray'. same like in next code below :-

[array([[       nan],
       [0.30151367],
       [0.30395508],
       ...,
       [0.3125    ],
       [0.31738281],
       [0.3112793 ]]), array([[0.30151367],
       [0.30883789],
       [0.29907227],
       ...,
       [0.31738281],
       [       nan],
       [0.31738281]]), array([[0.29541016],
       [0.29663086],
       [0.29296875],
       ...,
    

then i convert the list to an array and i got a flat array its shape became (167, 9000,1).

[[[       nan]
  [0.30151367]
  [0.30395508]
  ...
  [0.3125    ]
  [0.31738281]
  [0.3112793 ]]

 [[0.30151367]
  [0.30883789]
  [0.29907227]
  ...
  [0.31738281]
  [       nan]
  [0.31738281]]

 [[0.29541016]
  [0.29663086]
  [0.29296875]
  ...
  [0.28686523]
  [0.29907227]
  [0.29663086]]

 ...

The code works very well and it removed all the outliers but i can not add the new array to my dataset because it did not match the shape.

now My array shape is (167,9000) I want to change it to (167,1), only one element contains all the samples of time series.

How can i return the shape array after converting as before?? or How can i let one element of an array contain some elements??


Solution

  • The signal before and after filtering is of same length and you append multiple signals (of length 9000) so you get a 167-long list of signals that are 9000 points long. Why are you expecting to get a 1D array? You get a list of lists ...

    import numpy as np
    from sktime.transformations.series.outlier_detection import HampelFilter
    
    # toy filter function
    def hampel_filter(sig):  # it is good style to save upper- and camel-case names for classes
        return HampelFilter().fit_transform(sig)
    
    # generate toy data
    data = np.random.rand(1, 3, 90)  # shape(data): (1, 3, 90)
    print(f'np.shape(data.shape): {data.shape}')
    
    mydata = data[0, :]  # shape(mydata): (3, 90)
    print(f'shape(mydata): {mydata.shape}')
    
    mydata_filtered = []
    for signal in mydata:  # shape(signal): (90, 1) => this is a 2D array! (a vector)
        print(f'shape(signal): {np.shape(signal)}')
        signal_filtered = hampel_filter(signal)
        mydata_filtered.append(signal_filtered)
    print(f'shape(mydata_filtered): {np.shape(mydata_filtered)}')
    

    and you'll get:

    np.shape(data.shape): (1, 3, 90)
    shape(mydata): (3, 90)
    shape(signal): (90,)
    shape(signal): (90,)
    shape(signal): (90,)
    shape(mydata_filtered): (3, 90, 1)
    

    you can flatten the filtered signal in the hampel_filter if you need to have an array returned, you would get:

    np.shape(data.shape): (1, 3, 90)
    shape(mydata): (3, 90)
    shape(signal): (90,)
    shape(signal): (90,)
    shape(signal): (90,)
    shape(mydata_filtered): (3, 90)