Search code examples
neural-networkrandom-forestfeature-selection

Tree-Based dimensionality reduction in DNN algorithms


My question is straightforward: is it possible to use a tree-based dimensionality reduction such as feature importance embedded in the Random Forest before training the dataset with a DNN algorithm?

In other words, does the use of tree-based feature importance prevents the use of training algorithms different from the tree/Random Forest?


Solution

  • I think you should read the DNN article.

    Why? Why do you want to use Random Forest before DNN training?

    Yes, you can display the feature importance of random-forest using

    random_forest = RandomForestClassifier(random_state=42).fit(x_train, y_train)
    
    feature_importances = DataFrame(random_forest.feature_importances_,
                                    index = x_train.columns,
                                    columns=['importance']).sort_values('importance', 
                                                                     ascending=False)
        
    print(feature_importances)
    

    But this is a feature-extraction method. The DNN is a neural-network method.

    DNN is more complex than random-forest, while random-forest handles feature-extraction, DNN handles

    • feature-extraction,
    • back-propagation,
    • feed-forward methods.

    If you feed enough training samples for DNN, you will have higher accuracy.

    • Does the use of tree-based feature importance prevents the use of training algorithms?

    No, based on the problem, the sufficient feature size and samples vary. Usually, you don't use random-forest to extract 1M images feature importance.

    Also, you don't use DNN for small-datasets.