I have a binary classification text data in which there are 10 text features.
I use various techniques like Bag of words, TFIDF etc. to convert them to numerical.
I use hstack() to stack all those features together again after processing them.
After converting them to numerical feature, each feature now has large number of columns hence after conversion, my dataset has around 3000 columns.
My question is when I fit this dataset into decision tree classifier (sklearn), how does the classifier recognizes the columns which belong to a particular feature?
For example first 51 column out of 3000 belong to US_states Bag of words.
Now, how will the DT recognize it?
PS: Data before processing is in pandas Dataframe.
After processing, it is a stacked numpy array being input in the classifier.
The Decision Tree won't recognize from which features the attributes are coming.