Search code examples
rshinyfeature-selection

How to stop Boruta algorithm including the outcome variable in the output?


I am developing an app in R shiny that will allow users to run the Boruta algorithm on an uploaded dataset. It is including the outcome variable in the output. I understand it shouldn't do that but don't know how to make it stop. Here's a cut down version of my code - as you can see the iris dataset is loaded and the user selects the outcome variable before running the feature selection on the second page:

library(shiny)          ## for app building
library(shinythemes)    ## for formatting
library(Boruta)         ## Boruta algorithm

# Define UI for demo app ----
ui <- fluidPage(
  navbarPage("", collapsible = TRUE, inverse = TRUE, theme = shinytheme("cerulean"),
             tabPanel("Read data",
                      fluidPage(
                        sidebarLayout(
                          sidebarPanel(width = 4,
                             # Outcome variable ----
                             textOutput("Outcome1"),
                             ),  
                            # Output: Data file ----
                          mainPanel(
                            h4("Table 1"),
                            helpText("Click into the table to identify the outcome variable."),
                            DTOutput("File1_table"),
                            )
                        ))),

             tabPanel("Feature selection",
                      fluidPage(
                        br(),
                        sidebarLayout(      
                          sidebarPanel(
                            selectInput("For_Boruta", "Choose a dataset for feature selection", c("Table1","Table2"),selected = NULL),
                            hr(),
                            textOutput("recap_selections"),
                            actionButton("boruta", "Run Boruta"),
                          ),
                          # Boruta output
                          mainPanel(
                            h4("Boruta Results"),
                            tableOutput("BorutaTbl"),
                            plotOutput("BorutaPlot"),
                          )
                        ))),
  ))
################################
#                              #
#    Server side logic         #
#                              #
################################

server <- function(input, output, session) {

  Table1 <- reactive({iris })
  Out_var1 <- NULL
  
  #show the iris file that has been read in
   output$File1_table <- renderDT(
    Table1(), options = list(pageLength = 10, lengthChange = FALSE, searching = FALSE),
    selection = list(target = 'column') #allows selection of columns
  )
  #save chosen variable in Out_var1
  Out_var1 <<- reactive({names(Table1()[input$File1_table_columns_selected])})  
  #report variable for table1
  output$Outcome1 <- renderPrint(cat('The outcome variable is:\n\n', Out_var1()))

  ##############################
  # Run BORUTA and show output #
  ##############################
  
  #provide some text to make clear what will run
  output$recap_selections <- renderText({
    paste("You have chosen to run Boruta on", input$For_Boruta, "where", Out_var1(), "is the outcome variable.
          If that is correct click below to run.")
  })
  
  #use the boruta button to run as defined above
  observeEvent(input$boruta, {
    X = eval(parse(text=paste(input$For_Boruta,"()"))) #this refers to the chosen dataset
    set.seed(206)
    boruta_output <- Boruta(x = X,
                            y = as.factor(X[[Out_var1()]]), doTrace=2)
    output$BorutaTbl <- renderTable(attStats(boruta_output),rownames = TRUE)
    output$BorutaPlot <- renderPlot({plot(boruta_output)})
  })
}

# Create Shiny app ----
shinyApp(ui, server)

I tried removing the as.factor() but it didn't seem to make a difference - the "Species" variable still appears in the output. I'd be grateful for any advice?


Solution

  • On further inspection, second comment above is incorrect: the authors do not drop the classification column from the input data frame. But they also do not use the Boruta function in a way that I would consider "obvious", even in their own examples.

    I don't have the Boruta package, and I am not willing to install it (and its dependencies) simply to answer a question on SO. However, I think I can point you in the right direction.

    The examples in the documentation for the Boruta function generally start like this:

    Boruta(Y~.,data=srx)->Boruta.srx
    

    In other words, they provide a formula as the first argument. You need a dynamic formula, rather than a static one, but we can manage that. But before I show you how, I need to digress to a couple of changes I made to your example code.

    As I mentioned in my first comment, nesting reactives is not a good idea. In my experience, it always ends badly. There are also simpler (to my mind) and easier ways of selecting an input data frame and an outcome column.

    First, I modify your definition of the input widget that selects the input data frame and create a new one to select an outcome column.

    selectInput(
      "For_Boruta",
      "Choose a dataset for feature selection",
      c("iris","mtcars"),
      selected = "iris"
    ),
    selectInput(
      "outcomeVar",
      "Choose a column as the outcome variable",
      c(),
      selected = NULL
    )
    

    Note that the choices for input$outcomeVar are undefined. That's because the available columns depend on the selected data frame.

    So, in the app's server function, you need to define a reactive that holds the input data frame

      selectedData <- reactive({
        if(input$For_Boruta == "iris") {
            iris
          } else {
            mtcars
          }
      })
    

    and update the choices for input$outcomeVar whenever it changes

      observeEvent(selectedData(), {
        updateSelectInput(session, "outcomeVar", choices = names(selectedData()))
      })
    

    Now you can create a reactive to contain the results of the Boruta analysis

      borutaOutput <- reactive({
        set.seed(206)
        # Following format of first code snippet in the Boruta vignette
        Boruta(
          as.formula(paste0(input$outcomeVar, " ~ .")),
          data = selectedData()
        )
      })
    

    Note the creation of the input formula with as.formula(paste0(input$outcomeVar, " ~ .")) and provision of the input data frame with data = selectedData().

    Now you create your output plot and table.

      output$BorutaTbl <- renderTable(
        attStats(borutaOutput()),
        rownames = TRUE
      )
    
      output$BorutaPlot <- renderPlot({
        plot(borutaOutput())
      })
    

    As a result of these changes, there is no nesting of reactives in my code.

    There are other consequential changes to your "debugging" widgets, but they aren't central to the main purpose of your app. I also dropped your formatting and layout options, as they are peripheral to your question.

    Here's the full source of the app.

    library(shiny)          ## for app building
    library(DT)
    
    # Define UI for demo app ----
    ui <- fluidPage(
      navbarPage("", collapsible = TRUE, inverse = TRUE),
                 tabPanel("Read data",
                          fluidPage(
                            sidebarLayout(
                              sidebarPanel(width = 4,
                                           textOutput("Outcome1"),
                              ),
                              mainPanel(
                                h4("Table 1"),
                                DTOutput("File1_table"),
                              )
                            ))),
    
                 tabPanel("Feature selection",
                          fluidPage(
                            br(),
                            sidebarLayout(
                              sidebarPanel(
                                selectInput(
                                  "For_Boruta",
                                  "Choose a dataset for feature selection",
                                  c("iris","mtcars"),
                                  selected = "iris"
                                ),
                                selectInput(
                                  "outcomeVar",
                                  "Choose a column as the outcome variable",
                                  c(),
                                  selected = NULL
                                ),
                                hr(),
                                textOutput("recap_selections"),
                                actionButton("boruta", "Run Boruta"),
                              ),
                              mainPanel(
                                h4("Boruta Results"),
                                tableOutput("BorutaTbl"),
                                plotOutput("BorutaPlot"),
                              )
                            )))
      )
    
    server <- function(input, output, session) {
      selectedData <- reactive({
        if(input$For_Boruta == "iris") {
            iris
          } else {
            mtcars
          }
      })
    
      observeEvent(selectedData(), {
        updateSelectInput(session, "outcomeVar", choices = names(selectedData()))
      })
    
      output$File1_table <- renderDT(
        selectedData(), options = list(pageLength = 10, lengthChange = FALSE, searching = FALSE)
      )
    
      output$Outcome1 <- renderPrint(cat('The outcome variable is:\n\n', input$outcomeVar))
    
      #provide some text to make clear what will run
      output$recap_selections <- renderText({
        paste("You have chosen to run Boruta on", input$For_Boruta, "where", input$outcomeVar, "is the outcome variable.
              If that is correct click below to run.")
      })
    
      borutaOutput <- reactive({
        set.seed(206)
        # Following format of first code snippet in the Boruta vignette
        Boruta(
          as.formula(paste0(input$outcomeVar, " ~ .")),
          data = selectedData()
        )
      })
    
      output$BorutaTbl <- renderTable(
        attStats(borutaOutput()),
        rownames = TRUE
      )
    
      output$BorutaPlot <- renderPlot({
        plot(borutaOutput())
      })
    }
    
    shinyApp(ui, server)
    

    Remember, I don't have the Boruta package, so I can't test the code relating to its use. However, everything else in the app works as I would expect. If the output from the Boruta call still includes column(s) that you don't want, simply drop them within the borutaTbl widget. Something like this should do the trick:

      output$BorutaTbl <- renderTable(
        attStats(borutaOutput()) %>% select(-c(<columns-to-drop>)),
        rownames = TRUE
      )
    

    That's a tidyverse solution. You can use base R functions as well if you wish.