I am trying to plot large datasets in scatterplots using highcharter package (> 50k rows of data), after some reading I found out that the highchart2()
function includes the boost module from highcharts which should improve the performance a lot when plotting large amounts of data. Take the following example:
library(highcharter) # I'm using the latest version from github (0.5.0.9999)
x <- data.frame(a = rnorm(5000),
b = rnorm(5000),
cat = c(rep("Yes", 2500), rep("No",2500)))
highchart() %>%
hc_add_series(data = x, type = "scatter", hcaes(x=a, y=b, group=cat))
This should correctly create a scatterplot but with already some performance issues due to the amount of data. This is why I switched to highchart2()
but to my surprise the plot does not show any data points when trying:
highchart2() %>%
hc_add_series(data = x, type = "scatter", hcaes(x=a, y=b, group=cat))
And after some more searching and reading I found out that when using list_parse2()
the plot is rendered much faster, so I tried this:
highchart2() %>%
hc_add_series(data = list_parse2(x), type = "scatter", hcaes(x=a, y=b, group=cat))
And of course it doesnt work, because I changed the structure of the input data, and stripped the names of the variables I was giving to hcaes()
. Then, when I tried this:
highchart2() %>%
hc_add_series(data = list_parse2(x), type = "scatter")
I got a very fast rendered plot, BUT I cannot get the grouping working that will differentiate between "Yes" and "No" at each point, so all points are now the same color.
So my question would be, how can I efficiently plot large datasets with highcharter while keeping the ability to assign a variable to the "group" parameter in hcaes()
?
Thanks in advance for your help.
A mini disclaimer: The hcaes
work only if the data
object is a data.frame
.
Now, you can use dplyr
, to get a data frame of series using the group_by
function and then use the auxiliar function hc_add_series_list
to add simultaniously more than one series.
library(highcharter) # I'm using the latest version from github (0.5.0.9999)
x <- data.frame(a = rnorm(5000), b = rnorm(5000), cat = c(rep("Yes", 2500),
rep("No", 2500)))
library(dplyr)
xseries <- x %>%
# use `name` to name series according the value of `cat` avariable
group_by(name = cat) %>%
do(data = list_parse2(.)) %>%
# add type of series
mutate(type = "scatter")
# A data frame of series
xseries
#> Source: local data frame [2 x 3]
#> Groups: <by row>
#>
#> # A tibble: 2 x 3
#> name data type
#> <fctr> <list> <chr>
#> 1 No <list [2,500]> scatter
#> 2 Yes <list [2,500]> scatter
And finally:
highchart2() %>%
hc_add_series_list(xseries)