In the function definition for the FSelector information.gain function,
information.gain(formula, data)
what exactly is the purpose of the formula? I'm trying to use the function to do feature selection for a classification task. In the few examples that I've seen online, it seems like the formula defines some kind of relationship between the class label and the features in the dataset. However, if this is the case, I don't know the exact linear relationship between the features and the labels since I'm performing a classification task, so what would the formula be?
You can use .
to tell R that you want to analyse the dependency between a class variable and all other variables in the data frame. For example for the iris
dataset:
> library(FSelector)
> information.gain(Species~., iris)
attr_importance
Sepal.Length 0.4521286
Sepal.Width 0.2672750
Petal.Length 0.9402853
Petal.Width 0.9554360
If you want to analyse the interaction with respect to only a subset of the variables, you can use explicit names:
> information.gain(Species~Sepal.Length+Sepal.Width, iris)
attr_importance
Sepal.Length 0.4521286
Sepal.Width 0.2672750