I used FeatureAgglomeration to cluster my 105x105 dataframe into 40 clusters based on Spearman. Now I want to get the output feature names using feature_names_in and get_feature_names_out, but it does not seem to work, and I cannot find the solution anymore. This is my code:
import pandas as pd
import numpy as np
from sklearn.cluster import FeatureAgglomeration
features = np.array([...])
print(features.shape)
>>> (105,)
Class1_rank=pd.read_excel(r'H:\PycharmProjects\RadiomicsPipeline\Class1_rank.xlsx')
print(Class1_rank)
>>> original_shape_Elongation ... original_ngtdm_Strength
original_shape_Elongation 1.000000 ... -0.054310
original_shape_Flatness 0.616327 ... -0.019544
original_shape_LeastAxisLength 0.271645 ... -0.293157
>>> [105 rows x 105 columns]
print(agglo.n_features_in_)
>>> 105
print(agglo.feature_names_in_(Class1_rank))
print(agglo.get_feature_names_out())
df_reduced = agglo.transform(Class1)
At print(agglo.feature_names_in_())
I get to following error:
TypeError: 'numpy.ndarray' object is not callable
However, Class1_rank
is a DataFrame, and thus should not give that error? What I am doing wrong here?
What I have tried:
Comment print(agglo.feature_names_in_(Class1_rank))
. Works, but then print(agglo.get features out)
gives the following result, and not the names of the features I included.
['featureagglomeration0' 'featureagglomeration1' 'featureagglomeration2' 'featureagglomeration3' 'featureagglomeration4'....]
Use features
as input for both functions, gives the same error.
Insert the features as strings for Class1_rank
, gives the same error.
feature_names_in_
is an array, not a callable, so agglo.feature_names_in_
is correct, but parentheses after it (empty or not) is incorrect.
get_feature_names_out()
gives names for each cluster, which are not in 1-1 correspondence with input features, so it cannot give you something like the original feature names. You can use the labels_
attribute to find which input features go into which output features, see e.g. this answer.