Search code examples
pythontime-seriesfeature-extractiontsfresh

how to use tsfresh python package to extract features from time series data?


I have a list of lists where each list represents a time series :

tsli=[[43,65,23,765,233,455,7,32,57,78,4,32],[34,32,565,87,23,86,32,56,32,57,78,32],[87,43,12,46,32,46,13,23,6,90,67,8],[1,2,3,3,4,5,6,7,8,9,0,9],[12,34,56,76,34,12,45,67,34,21,12,22]]

i want to extract feature from this dataset with tsfresh package using code:

import tsfresh
tf=tsfresh.extract_features(tsli)

When i'm running it i'm getting Value error which is:

> ValueError: You have to set the column_id which contains the ids of the different time series
But i don't know how to deal with this and how to define column id for this problem.

EDIT 1: As suggested i had tried by converting the dataset into data and then tried :

import tsfresh
df=pd.DataFrame(tsli)
tf=tsfresh.extract_features(df)

but the Value error is same

> ValueError: You have to set the column_id which contains the ids of the different time series

Any resource or reference will be helpful.

Thanks


Solution

  • First you have to convert your list to a dataframe, where every time-series has an unique id, e.g.

    df = pd.DataFrame()
    for i, ts in enumerate(tsli):
        data = [[x, i] for x in ts]
        df = df.append(data, ignore_index=True)
    df.columns = ['value', 'id']
    

    enter image description here ... enter image description here

    Now you can use tsfresh with column_id argument on the created column:

    tf=tsfresh.extract_features(df, column_id='id')
    
    
    >> Feature Extraction: 100%|██████████| 5/5 [00:00<00:00, 36.83it/s]
    

    Another example: tsfresh Quick Start