Search code examples
pythontime-seriesfeature-extractionfeature-selectiontsfresh

Selecting only a certain number of top features using tsfresh


How can I select top n features of time series using tsfresh? Can I decide the number of top features I want to extract?


Solution

  • Based on the above comment from @Chaitra and this answer I give an answer.

    You can decide the number of top features by using the tsfresh relevance table described in the documentation. You can then sort the table by the p-value and the the top n features.

    Example code printing top 11 features:

    from tsfresh import extract_features
    from tsfresh.feature_selection.relevance import calculate_relevance_table
    
    extracted_features = extract_features(
        X,
        column_id="id",
        column_kind="kind",
        column_value="value",
    )
    relevance_table = calculate_relevance_table(extracted_features, y)
    relevance_table = relevance_table[relevance_table.relevant]
    relevance_table.sort_values("p_value", inplace=True)
    print(relevance_table["feature"][:11])