I used the sktime library's TimeSeriesForestClassifier class to perform multivariate time series classification.
The code is as follows
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sktime.classification.compose import ColumnEnsembleClassifier
from sktime.classification.interval_based import TimeSeriesForestClassifier
from sktime.datasets import load_basic_motions
from sktime.transformations.panel.compose import ColumnConcatenator
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
steps = [
("concatenate", ColumnConcatenator()),
("classify", TimeSeriesForestClassifier(n_estimators=100)),
]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
I would like to check the value of feature_importances_, which is not the same length as the input, but an array with the same length as the number of features.
clf.steps[1][1].feature_importances_
I would like to know which part of the input each importance corresponds to. Is there any way to get information about which section of the input the TimeSeriesForestClassifier is calculating features from?
You can get the intervals (start and end index) for each tree of the ensemble from:
clf.steps[1][1].intervals_
sktime now also has an implementation of the newer Canonical Interval Forecast.
When we first implemented the Time Series Forest algorithm, we ended up with two versions. The one that you're using is the recommended one, but the older version provides its own functionality for the feature importance graph (see below).
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sktime.classification.compose import ColumnEnsembleClassifier
from sktime.classification.compose import ComposableTimeSeriesForestClassifier
from sktime.datasets import load_basic_motions
from sktime.transformations.panel.compose import ColumnConcatenator
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
steps = [
("concatenate", ColumnConcatenator()),
("classify", ComposableTimeSeriesForestClassifier(n_estimators=100)),
]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
clf.steps[-1][-1].feature_importances_.rename(columns={"_slope": "slope"}).plot(xlabel="time", ylabel="feature importance")
Be aware of some subtle issues in the calculation and interpretation of the feature importances. The relevant issues are here: