I need to include in a shiny app an interactive box and whiskers plot for a dataset with ~46 million rows across 11 groups. I'd like to use ggplot+plotly to achive this. Because ggplot takes way too long to generate the plot (and plotly can't even deal with so much data) i decided to precalculate the quantiles and use those values with ggplot. Here is an example of the quantiles dataset and the ggplot code to produce the boxplot:
quantiles_hw_dt=data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
dept_id = c("TFWHH9388IU","YGQGI3019WK",
"DKGYA0367QU","TOXLN0137AW","XLETL1793EZ","UXYFN1869CM",
"LLHPP0112XP","GYKJF2649DH","RKPIE1418HX",
"AZOMD4805RL","UZGWY7250YJ"),
`0%` = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
`25%` = c(8L, 5L, 13L, 7L, 8L, 7L, 6L, 11L, 12L, 9L, 10L),
`50%` = c(12L, 7L, 20L, 10L, 11L, 9L, 8L, 18L, 19L, 14L, 16L),
`75%` = c(17L, 9L, 29L, 14L, 16L, 12L, 10L, 25L, 28L, 21L, 23L),
`100%` = c(63L, 27L, 96L, 48L, 57L, 42L, 34L, 88L, 91L, 76L, 71L)
)
p=ggplot(quantiles_hw_dt, aes(dept_id)) +
geom_boxplot(
aes(ymin = `0%`, lower = `25%`, middle = `50%`, upper = `75%`, ymax = `100%`),
stat = "identity"
) + coord_flip()
p
However, when i try to convert it to plotly, i get a black canvas:
l <- plotly_build(p)
l$data[[1]]$orientation <- "h"
l
I am aware of some old issues plotly has with coord_flip(), hence the plotly_build approach that i've attempted (after ggplotly failed as well). It seems that it didn't do much. Even removing the coord_flip statement does not not solve the problem. Here's the plotly of the same ggplot but without coord_flip:
What am i missing here? Thanks
I commented yesterday, but you asked a few weeks ago and didn't get any answers. As I stated in my comment, setting the range can really help reduce processing time with Plotly. If you think about it, Plotly has to process all of the data before it can even build the base plot, to establish the range. You won't notice a processing time difference with the size of this example dataset, only in cases where there is a significant amount of data.
Like ggplot
, you can specify x, y, and groups, but you can also specify the metrics.
Using the method you utilized for ggplot
:
plot_ly(quantiles_hw_dt, type = "box", y = ~dept_id,
lowerfence = ~`0%`, q1 = ~`25%`, median = ~`50%`,
q3 = ~`75%`, upperfence = ~`100%`)
This is the default plot with no styles:
This is done in the layout
. I've extracted the unique values for the y-axis. For the x-axis, I set it 1:100, since it's percentages.
When I extracted the y-axis labels, I sorted them. When you assign the range this way, whatever order they are in when you assign it--that's the order they will appear in the plot. (They won't be alphabetized, for example, unless you sort them.)
I also assigned padding, so that the y-axis labels weren't pushed up against the plot.
# identify the ranges for the plot
ys <- sort(unique(quantiles_hw_dt$dept_id), decreasing = T)
xs <- c(0, 100)
plot_ly(quantiles_hw_dt, type = "box", y = ~dept_id,
lowerfence = ~`0%`, q1 = ~`25%`, median = ~`50%`,
q3 = ~`75%`, upperfence = ~`100%`) %>%
layout(xaxis = list(range = xs),
yaxis = list(categoryarray = ys),
margin = list(pad = 10))
ggplot
If you want it to look more like ggplot
, you can use the information that's in the plot you attempted to create. This doesn't include all of the stylings, but it should be enough to give you an idea of how you could change the style without a whole lot of effort.
p2 <- ggplotly(p) # create empty plot to cannibalize styles
x = p2$x$layout # extract layout
plot_ly(quantiles_hw_dt, type = "box", y = ~dept_id,
lowerfence = ~`0%`, q1 = ~`25%`, median = ~`50%`,
q3 = ~`75%`, upperfence = ~`100%`) %>%
layout(margin = x$margin, plot_bgcolor = x$plot_bgcolor, # attach new styles
paper_bgcolor = x$paper_bgcolor, font = x$font,
xaxis = list(showline = x$xaxis$showline,
linecolor = x$xaxis$linecolor,
gridcolor = x$xaxis$gridcolor,
linewidth = x$xaxis$linewidth,
zeroline = x$xaxis$zeroline,
tickfont = x$xaxis$tickfont,
ticklen = x$xaxis$ticklen),
yaxis = list(showline = x$yaxis$showline,
linecolor = x$yaxis$linecolor,
gridcolor = x$yaxis$gridcolor,
linewidth = x$yaxis$linewidth,
zeroline = x$yaxis$zeroline,
tickfont = x$yaxis$tickfont,
ticklen = x$yaxis$ticklen,
title = x$yaxis$title))
(This last plot doesn't include range setting.)