Search code examples
rhighchartsr-highcharter

highcharter hcaes "group" usage while plotting large amounts of data with highchart2()


I am trying to plot large datasets in scatterplots using highcharter package (> 50k rows of data), after some reading I found out that the highchart2() function includes the boost module from highcharts which should improve the performance a lot when plotting large amounts of data. Take the following example:

library(highcharter) # I'm using the latest version from github (0.5.0.9999)

x <- data.frame(a = rnorm(5000),
                b = rnorm(5000),
                cat = c(rep("Yes", 2500), rep("No",2500)))



highchart() %>%
  hc_add_series(data = x, type = "scatter", hcaes(x=a, y=b, group=cat))

This should correctly create a scatterplot but with already some performance issues due to the amount of data. This is why I switched to highchart2() but to my surprise the plot does not show any data points when trying:

highchart2() %>%
  hc_add_series(data = x, type = "scatter", hcaes(x=a, y=b, group=cat))

And after some more searching and reading I found out that when using list_parse2() the plot is rendered much faster, so I tried this:

highchart2() %>%
  hc_add_series(data = list_parse2(x), type = "scatter", hcaes(x=a, y=b, group=cat))

And of course it doesnt work, because I changed the structure of the input data, and stripped the names of the variables I was giving to hcaes(). Then, when I tried this:

highchart2() %>%
  hc_add_series(data = list_parse2(x), type = "scatter")

I got a very fast rendered plot, BUT I cannot get the grouping working that will differentiate between "Yes" and "No" at each point, so all points are now the same color.

So my question would be, how can I efficiently plot large datasets with highcharter while keeping the ability to assign a variable to the "group" parameter in hcaes()?

Thanks in advance for your help.


Solution

  • A mini disclaimer: The hcaes work only if the data object is a data.frame.

    Now, you can use dplyr, to get a data frame of series using the group_by function and then use the auxiliar function hc_add_series_listto add simultaniously more than one series.

    library(highcharter)  # I'm using the latest version from github (0.5.0.9999)
    
    x <- data.frame(a = rnorm(5000), b = rnorm(5000), cat = c(rep("Yes", 2500), 
      rep("No", 2500)))
    
    library(dplyr)
    
    xseries <- x %>% 
      # use `name` to name  series according the value of `cat` avariable
      group_by(name = cat) %>% 
      do(data = list_parse2(.)) %>%
      # add type of series
      mutate(type = "scatter")
    
    # A data frame of series
    xseries
    #> Source: local data frame [2 x 3]
    #> Groups: <by row>
    #> 
    #> # A tibble: 2 x 3
    #>     name           data    type
    #>   <fctr>         <list>   <chr>
    #> 1     No <list [2,500]> scatter
    #> 2    Yes <list [2,500]> scatter
    

    And finally:

    highchart2() %>% 
      hc_add_series_list(xseries)
    

    hc_add_series_list