Search code examples
nlprandom-forestxgboostsentiment-analysistext-classification

How to distinguish the direction of important features from xgboost or random forest?


I'm now working on binary text classification problem (like sentiment analysis), and it's trivial to pull out top important features of xgboost or random forest just by feature_importances_

Suppose we have two labelling 1 and 0 for this classification problem. Then there's any way to print out the direction of the features (positive or negative)? Say, word feature A has an enrichment or high tfidf with labelling 1.

Certainly I could pull out the tfidf column of this specific word feature, and correlate with the labelling with pearson coefficient, and the +/- of coefficient would indicate the direction, right? Any other more elegant way for this or xgboost and random forest has built-in such functions. (I didn't find)

Thanks


Solution

  • It isn't exactly what you're asking for, but I usually use Lime to do this. I like how it works even if I switch models.