Search code examples
rmachine-learningdecision-treeboosting

Reproduce boosting of C5.0 trials


I'm using the C50 package with R and need to export models for production.

I'm using boosting option, i know that trials are weighted but weights are not specified in my output.

I'm not using the weight option for miss-classification, i just need weights of trials.

Is there a way to know the weight of each trials of my c50 model through R ?


Solution

  • > fit <- C5.0(credit[,-24], credit[,24])
    > summary(fit)
    
    Call:
    C5.0.default(x = credit[, -24], y = credit[, 24])
    
    
    C5.0 [Release 2.07 GPL Edition]     Thu Nov 23 09:36:14 2017
    -------------------------------
    
    Class specified by attribute `outcome'
    
    Read 30000 cases (24 attributes) from undefined.data
    
    Decision tree:
    
    PAY_0 > 1:
    :...EDUCATION > 3: 0 (29/7)
    :   EDUCATION <= 3:
    :   :...PAY_3 <= -1: 0 (187/86)
    :       PAY_3 > -1: 1 (2914/830)
    PAY_0 <= 1:
    :...PAY_2 <= 1: 0 (24599/3514)
        PAY_2 > 1:
        :...PAY_6 <= 0: 0 (1625/605)
            PAY_6 > 0:
            :...PAY_6 > 2: 1 (58/21)
                PAY_6 <= 2:
                :...PAY_5 <= 0: 0 (132/52)
                    PAY_5 > 0:
                    :...SEX <= 1: 1 (215/82)
                        SEX > 1:
                        :...PAY_3 <= 1: 1 (40/13)
                            PAY_3 > 1: 0 (201/91)
    
    
    Evaluation on training data (30000 cases):
    
            Decision Tree   
          ----------------  
          Size      Errors  
    
            10 5301(17.7%)   <<
    
    
           (a)   (b)    <-classified as
          ----  ----
         22418   946    (a): class 0
          4355  2281    (b): class 1
    
    
        Attribute usage:
    
        100.00% PAY_0
         89.57% PAY_2
         11.14% PAY_3
         10.43% EDUCATION
          7.57% PAY_6
          1.96% PAY_5
          1.52% SEX
    
    
    Time: 2.5 secs
    

    Weight for all the variables used could be found by

    > C5imp(fit, metric = "splits")
               Overall
        PAY_3     22.22222
    PAY_6     22.22222
    EDUCATION 11.11111
    PAY_0     11.11111
    PAY_2     11.11111
    PAY_5     11.11111
    SEX       11.11111
    LIMIT_BAL  0.00000
    MARRIAGE   0.00000
    AGE        0.00000
    PAY_4      0.00000
    BILL_AMT1  0.00000
    BILL_AMT2  0.00000
    BILL_AMT3  0.00000
    BILL_AMT4  0.00000
    BILL_AMT5  0.00000
    BILL_AMT6  0.00000
    PAY_AMT1   0.00000
    PAY_AMT2   0.00000
    PAY_AMT3   0.00000
    PAY_AMT4   0.00000
    PAY_AMT5   0.00000
    PAY_AMT6   0.00000