Search code examples
rfselector

How to properly calculate all weights with FSelector package?


I'm trying to calculate weights of a dataset in R by using the FSelector package. The data is taken from this location.

data = read.csv("filepath/Indian Liver Patient Dataset (ILPD).csv")
names(data)<-c("Age","Gender", "TB", "DB", "Alkphos", "Sgpt", "Sgot", "TP", "ALB", "A/G Ratio", "Selector")
library(FSelector)
weights <- gain.ratio(Selector ~., data)
print(weights)

I can't calculate all of the weights. When I use the gain.ratio function, the Age weight is NaN. When I use chi.squared function instead, both Age and A/G Ratio are zeroes. When I take first 200 elements from data and calculate weights, only five of them are calculated corectly, and other are zeroes or NaN.

I tried deleting wrong elements from data by data <- na.omit(data) but it didn't change the result.

How can I calculate weights correctly?

Below is an example of a weight print.

Age             0.0000000
Gender          0.1304229
TB              0.3281865
DB              0.3238010
Alkphos         0.2965842
Sgpt            0.2734633
Sgot            0.3120432
TP              0.2504747
ALB             0.3051724
A/G Ratio       0.0000000

Solution

  • Zero is a valid value for feature importance -- it means that the feature does not have any information with respect to the classification target. The NaNs are caused by a bug in FSelector that divides by 0 if a feature carries no information. I've fixed this in the development version.

    The name "A/G Ratio" is not a valid R identifier and therefore causes problems with some of the methods. Below the code that fixes this and installs the development version of FSelector.

    data = read.csv("Indian\ Liver\ Patient\ Dataset\ (ILPD).csv")
    names(data)<-c("Age","Gender", "TB", "DB", "Alkphos", "Sgpt", "Sgot", "TP", "ALB", "AGRatio", "Selector")
    
    library(devtools)
    install_github("larskotthoff/fselector")
    
    library(FSelector)
    weights = gain.ratio(Selector~., data)
    print(weights)
    
    weights = chi.squared(Selector~., data)
    print(weights)
    

    Output:

            attr_importance
    Age          0.00000000
    Gender       0.01539699
    TB           0.09711392
    DB           0.11547683
    Alkphos      0.06593879
    Sgpt         0.06566624
    Sgot         0.07667241
    TP           0.08836895
    ALB          0.07766682
    AGRatio      0.15403574
    
            attr_importance
    Age           0.0000000
    Gender        0.1304229
    TB            0.3281865
    DB            0.3238010
    Alkphos       0.2965842
    Sgpt          0.2734633
    Sgot          0.3120432
    TP            0.2504747
    ALB           0.3051724
    AGRatio       0.0000000