Search code examples
rsvmwekarweka

extract weights from a RWeka SMOreg model


I am using the awesome RWeka package in order to fit a SMOreg model as implemented in Weka. While everything is working fine, I have some problem extracting the weights from the fitted model.

As all Weka classifier object, my model has a nice print method that shows me all the features and their relative weights. However, I am not able to extract this weights in any way.

You can see for yourself by running the following code:

library(RWeka)
data("mtcars")
SMOreg_classifier <- make_Weka_classifier("weka/classifiers/functions/SMOreg")
model_SMOreg <- SMOreg_classifier(mpg ~ ., data = mtcars)

Now, if you simply call the model

model_SMOreg

you'll see that it prints all the features used in the model with their relative weight. I would like to access those weights as a vector or, even better, as a 2-columns table with one column containing the names of the features and the other containing the weights.

I am working on a Windows 7 x64 system, using RStudio Version 1.0.153, R 3.4.2 Short Summer and RWeka 0.4-35.

Does someone know how to do this ?


Solution

  • Based on the suggestion of @knb I have wrote a function to extract the weights from a SMOreg model and return a tibble with one column for the features name and one for the features weight, with the row arranged following the absolute value of the weight.

    Note that this function only works for the SMOreg classifier, as the output of other classifiers is slightly different in terms of layout. However, I think the function can be easily adapted for other classifiers.

    library(stringr)
    library(tidyverse)
    
    extract_weights_from_SMOreg <- function(model) {
    
      oldw <- getOption("warn")
      options(warn = -1)
    
    
      raw_output <- capture.output(model)
      trimmed_output <- raw_output[-c(1:3,(length(raw_output) - 4): length(raw_output))]
      df <- data_frame(features_name = vector(length = length(trimmed_output) + 1, "character"), 
                       features_weight = vector(length = length(trimmed_output) + 1, "numeric"))
    
      for (line in 1:length(trimmed_output)) {
    
    
        string_as_vector <- trimmed_output[line] %>%
          str_split(string = ., pattern = " ") %>%
          unlist(.)
    
    
        numeric_element <- trimmed_output[line] %>%
          str_split(string = ., pattern = " ") %>%
          unlist(.) %>%
          as.numeric(.)
    
        position_mul <- string_as_vector[is.na(numeric_element)] %>%
          str_detect(string = ., pattern = "[*]") %>%
          which(.)
    
        numeric_element <- numeric_element %>%
          `[`(., c(1:position_mul))
    
        text_element <- string_as_vector[is.na(numeric_element)]
    
    
        there_is_plus <- string_as_vector[is.na(numeric_element)] %>%
          str_detect(string = ., pattern = "[+]") %>%
          sum(.)
    
        if (there_is_plus) { sign_is <- "+"} else { sign_is <- "-"}
    
    
    
        feature_weight <- numeric_element[!is.na(numeric_element)]
    
        if (sign_is == "-") {df[line, "features_weight"] <- feature_weight * -1} else {df[line, "features_weight"] <- numeric_element[!(is.na(numeric_element))]}
    
        df[line, "features_name"] <- paste(text_element[(position_mul + 1): length(text_element)], collapse = " ")
    
      }
    
      intercept_line <- raw_output[length(raw_output) - 4]
    
    
      there_is_plus_intercept <- intercept_line %>%
        str_detect(string = ., pattern = "[+]") %>%
        sum(.)
    
      if (there_is_plus_intercept) { intercept_sign_is <- "+"} else { intercept_sign_is <- "-"}
    
      numeric_intercept <- intercept_line %>%
        str_split(string = ., pattern = " ") %>%
        unlist(.) %>%
        as.numeric(.) %>%
        `[`(., length(.))
    
      df[nrow(df), "features_name"] <- "intercept"
    
      if (intercept_sign_is == "-") {df[nrow(df), "features_weight"] <- numeric_intercept * -1} else {df[nrow(df), "features_weight"] <- numeric_intercept}
    
      options(warn = oldw)
    
      df <- df %>%
        arrange(desc(abs(features_weight)))
    
      return(df)
    }
    

    Here an example for one model

    library(RWeka)
    data("mtcars")
    SMOreg_classifier <- make_Weka_classifier("weka/classifiers/functions/SMOreg")
    mpg_model_weights <- extract_weights_from_SMOreg(SMOreg_classifier(data = mtcars, mpg ~ .))
    mpg_model_weights