Search code examples
pythonpython-3.xmachine-learningdata-sciencerandom-forest

The naming and the sorting of the trained RF model's features in Python


So I have trained a RandomForest model on a fairly simple customer data. The prediction is either 1 or 0 telling if a customer will churn or not.

Let's say I have 10 features called 'f1', 'f2', 'f3' and so on... As the model has already been trained I took another period of the similar data to see how the model performs. But in this data the features could be shuffled in a different way. (for example 'f3', 'f10', 'f1', ...). Will the model look at the name of the features or it won't matter for it and it will think that 'f1' is 'f3'? Let's say the type of the data is the same in each column.

The reason I am asking this is because to check this theory I renamed 'f3' column name to 'a' and to my astonishment the model worked anyways. What are your thoughts?


Solution

  • The algorithm works independent from your column names. You can name your columns whatever you want in most algorithms(except fbprophet etc.)

    But there is one important point here: When you want to predict a dataset result you need to give your dataset columns respect to training model columns' order.

    In your case you can rename your columns f1, f2, f3.. to abc1, abc2, def3.. but you cannot shuffle their order.