Search code examples
rshinyglm

Shiny downloadhandler unnecessary big file


I am building a shiny application in which you can train a model. One feature is to be able to download the model object (in this case, a glm object), such that the user can use it later on - outside of the application. The relevant part of my code looks as follows

library(shiny)
library(car)

ui <- fluidPage(

  # What parameter do you wish to estimate
  selectInput(inputId = "dependent_variable",
              label = "Select dependent variable",
              choices = c("education",
                          "vocabulary")),

  # Download button for model
  downloadButton(outputId = "download_model", label = 'Download Model')
)

server <- function(input, output){

  strip_glm <- function(cm) {
    cm$y <- c()
    cm$model <- c()

    cm$residuals <- c()
    cm$fitted.values <- c()
    cm$effects <- c()
    cm$qr$qr <- c()  
    cm$linear.predictors <- c()
    cm$weights <- c()
    cm$prior.weights <- c()
    cm$data <- c()


    cm$family$variance <- c()
    cm$family$dev.resids <- c()
    cm$family$aic <- c()
    cm$family$validmu <- c()
    cm$family$simulate <- c()
    attr(cm$terms,".Environment") <- c()
    attr(cm$formula,".Environment") <- c()

    return(cm)
  }

  reactive_glm_model <- reactive(glm(paste0(input$dependent_variable, "~."), data = Vocab))
  stripped_glm <- reactive(strip_glm(reactive_glm_model()))
  stripped_glm_summary <- reactive(summary(reactive_glm_model()))

  output$download_model <- downloadHandler(
    filename = function() {
      "report.Rd"
    },
    content = function(file) {

      glm_object <- stripped_glm()
      glm_summary <- stripped_glm_summary()
      save(glm_object, glm_summary, file = file)
    }
  )

}

shinyApp(ui, server)

I use the strip_glm() function, because I don't want the glm object to be too big and carry unnecessary stuff. It should only be able to predict. However, by stripping glm, summary() does not work anymore, therefore I'd like to return the summary as well.

So here is my problem: If I download the object, there are still some 'hidden' objects making the file too big. In this reprex, it is 16.2 MB, whereas if I load the corresponding object back into memory, I find the real object size is way less

load("report.Rd")
object.size(glm_object) # 22 kB
object.size(glm_summary) # 2.5 MB

What is going on here? In the models I am using, my data potentially has millions of rows, causing the object to be several GB's and the downloading takes ages.


UPDATE

It seems to be related to the version or underlying settings. In the above settings, where I do encounter the problem I use

platform       x86_64-redhat-linux-gnu     
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          5.2                         
year           2018                        
month          12                          
day            20                          
svn rev        75870                       
language       R                           
version.string R version 3.5.2 (2018-12-20)
nickname       Eggshell Igloo

Unfortunately I am not able to update the version of R due to policy constraints


UPDATE II

It seems the problem is not related to R or shiny and not reproducible on different platforms


Solution

  • Colleague here. We run this code with RStudio Server, which seems to be causing the problem. Running the reprex with R itself (but still on the same server using the same R executable), bypassing RStudio, fixes the issue and the downloaded R object is a little over 2 MB.

    No idea why using RStudio is messing things up, though. The version used is RStudio Server (Pro) 1.2.5001-3