Search code examples
rshinydatatablesdtfst

how to load & render part of data when user click on certain page


I am facing an out-of-memory issue in R, when I tried to load multiple tables and render using DT in shiny.

I am wondering if it is possible to only provide table structure (eg, no of rows & column names) to DT, and pre-load first N rows data to display in the app, then load another N rows when user click another page (pagination enabled). I found DT has a dataTableAjax function that return a Ajax URL and can be queried by DT (not sure how it done)

The original datatables JS library has an similar feature (if I not mistaken), as in https://datatables.net/examples/server_side/defer_loading.html

For example,

sample_table <- data.frame(a = rnorm(1e7), b = rnorm(1e7), c = rnorm(1e7))

library(fst)

# write large data on disk
write_fst(sample_table, "sample_table.fst")

# how to load data on disk on-demand using Ajax?
shinyApp(
  ui = fluidPage(
    title = 'Server-side processing of DataTables',
    fluidRow(
      DT::dataTableOutput('tbl')
    )
  ),
  server = function(input, output, session) {
    # create a widget using an Ajax URL created above
    tbl_ajax_url <- reactiveVal({
      dataTableAjax(
        session, 
        read_fst("sample_table.fst", from = 1, to = 100, as.data.table = TRUE), 
        outputId = 'tbl')
    })
    observeEvent(input$tbl_rows_current, {
      rows <- input$tbl_rows_current
      tbl_ajax_url(dataTableAjax(
        session, 
        # random access like fst, only load required data when user click the page
        read_fst("sample_table.fst", from = min(rows), to = max(rows), 
                 as.data.table = TRUE), 
        outputId = 'tbl'))
    })

    output$tbl = DT::renderDataTable({
      datatable(data.table(
        a = numeric(), b = numeric(), c = numeric(),
        check.names = FALSE), rownames = FALSE, options = list(
          ajax = list(
            serverSide = TRUE, processing = TRUE,
            # not sure how to do this part, where url only return part of data
            url = tbl_ajax_url()
          )
        ))
    })
  }
)

If you have any other suggestions, please let me know as well. My primary objective is to prevent loading all tables in R at once, instead only load partially on-demand.

PS: I am not familiar with any HTML, CSS & JS, please be patient and provide as many details as possible, thanks in advance!


Solution

  • I have figure out a solution by myself, but I just put here in case somebody interested.

    Using funcFilter in renderDT, we can create a new data source and render to DT. I created a on-disk data source, which only read data saved on disk when needed using fst.

    Code:

    sample_table <- data.frame(a = rnorm(1e7), b = rnorm(1e7), c = rnorm(1e7))
    
    library(fst)
    library(shiny)
    library(DT)
    library(data.table)
    
    # write large data on disk
    write_fst(sample_table, "sample_table.fst")
    
    shinyApp(
      ui = fluidPage(
        title = 'Server-side processing of DataTables',
        fluidRow(
          DT::dataTableOutput('tbl')
        )
      ),
      server = function(input, output, session) {
        output$tbl = DT::renderDataTable({
          datatable(data.frame(
            a = numeric(), b = numeric(), c = numeric(),
            check.names = FALSE), rownames = FALSE)
        }, funcFilter = dataTablesFilterOnDisk)
      }
    )
    
    dataTablesFilterOnDisk <- function(data, params) {
      start <- as.integer(params$start)
      length <- as.integer(params$length)
      total_rows <- fst::metadata_fst("sample_table.fst")$nrOfRows
      cleanDataFrame <- function(x, escape = params$escape) {
        if (escape != "false") {
          k = seq_len(ncol(x))
          if (escape != "true") {
            k = k[as.integer(strsplit(escape, ",")[[1]])]
          }
          for (j in k) if (is.character(x[, j]) || is.factor(x[, j])) 
            x[, j] = htmltools::htmlEscape(x[, j])
        }
        x = unname(x)  # remove column names
        if (!is.data.frame(x)) return(x)
        for (j in seq_len(ncol(x))) {
          xj = x[, j]
          xj = unname(xj)  # remove names
          dim(xj) = NULL  # drop dimensions
          if (is.table(xj)) xj = c(xj)  # drop the table class
          x[[j]] = xj
        }
        unname(x)
      }
      row_range <- c(start + 1L, start + length)
    
      data <- fst::read_fst("sample_table.fst", columns = colnames(data), 
                            from = row_range[1L], 
                            to = min(row_range[2L], total_rows))
      
      list(draw = as.integer(params$draw), recordsTotal = total_rows, 
           recordsFiltered = total_rows, data = cleanDataFrame(data), 
           DT_rows_all = NULL, 
           DT_rows_current = seq.int(row_range[1L], row_range[2L], by = 1L))
    }
    

    See more references:

    1. https://github.com/grahamrp/dtdatasources