I have a dataset to build a classificator:
dataset = pd.read_csv(sys.argv[1], decimal=",",delimiter=";", encoding='cp1251')
X=dataset.ix[:, dataset.columns != 'class']
Y=dataset['class']
I want to select important features only, so I do:
clf=svm.SVC(probability=True, gamma=0.017, C=5, coef0=0.00001, kernel='linear', class_weight='balanced')
model = SelectFromModel(clf, prefit=True)
X_train, X_test, Y_train, Y_test = cross_validation.train_test_split(X, Y, test_size=0.5, random_state=5)
y_pred=clf.fit(X_train, Y_train).predict(X_test)
X_new = model.transform(X)
So X_new has a shape 3000x72 while X had 3000x130. I would like to get a list of the features which are and are not in X_new. How can I do it?
X was a dataframe with a header, but X_new is a list of lists with feature values without any name, so I can't merge it as I would do in pandas. Thank you for any help!
clf.coef_
returns you a list of feature weights (apply after fit()
). Sort it by weights and you see which are not very useful.