Search code examples
rmachine-learningcategorical-datamultinomialgbm

In gbm multinomial dist, how to use predict to get categorical output?


My response is a categorical variable (some alphabets), so I used distribution='multinomial' when making the model, and now I want to predict the response and obtain the output in terms of these alphabets, instead of matrix of probabilities.

However in predict(model, newdata, type='response'), it gives probabilities, same as the result of type='link'.

Is there a way to obtain categorical outputs?

BST = gbm(V1~.,data=training,distribution='multinomial',n.trees=2000,interaction.depth=4,cv.folds=5,shrinkage=0.005)

predBST = predict(BST,newdata=test,type='response')

Solution

  • In predict.gbm documentation, it is mentioned:

    If type="response" then gbm converts back to the same scale as the outcome. Currently the only effect this will have is returning probabilities for bernoulli and expected counts for poisson. For the other distributions "response" and "link" return the same.

    What you should do, as Dominic suggests, is to pick the response with the highest probability from the resulting predBST matrix, by doing apply(.., 1, which.max) on the vector output from prediction. Here is a code sample with the iris dataset:

    library(gbm)
    
    data(iris)
    
    df <- iris[,-c(1)] # remove index
    
    df <- df[sample(nrow(df)),]  # shuffle
    
    df.train <- df[1:100,]
    df.test <- df[101:150,]
    
    BST = gbm(Species~.,data=df.train,
             distribution='multinomial',
             n.trees=200,
             interaction.depth=4,
             #cv.folds=5,
             shrinkage=0.005)
    
    predBST = predict(BST,n.trees=200, newdata=df.test,type='response')
    
    p.predBST <- apply(predBST, 1, which.max)
    
    > predBST[1:6,,]
         setosa versicolor  virginica
    [1,] 0.89010862 0.05501921 0.05487217
    [2,] 0.09370400 0.45616148 0.45013452
    [3,] 0.05476228 0.05968445 0.88555327
    [4,] 0.05452803 0.06006513 0.88540684
    [5,] 0.05393377 0.06735331 0.87871292
    [6,] 0.05416855 0.06548646 0.88034499
    
     > head(p.predBST)
     [1] 1 2 3 3 3 3