Split points by a factor in a plotly scatter plot

Probably an easy one.

I have data points (with error bars) that I'd like to plot. There are two levels of grouping factors: group and cluster:

set.seed(1)
df <- data.frame(cluster=rep(LETTERS[1:10],2),group=c(rep("A",10),rep("B",10)),point=rnorm(20),err=runif(20,0.1,0.3))
df$group <- factor(df$group,levels=c("A","B"))

I'd like to plot the points where the x-axis is df$cluster, and within each cluster the points are color coded by df$group and split (so that group A point is left to group B point).

Here's what I'm trying:

library(plotly)
plot_ly(x=~df$cluster,y=~df$point,split=~df$group,type='scatter',mode="markers",showlegend=T,color=~df$group) %>%
  layout(legend=list(orientation="h",xanchor="center",x=0.5,y=1),xaxis=list(title=NA,zeroline=F,categoryorder="array",categoryarray=sort(unique(df$cluster)),showticklabels=T),yaxis=list(title="Val",zeroline=F)) %>%
  plotly::add_trace(error_y=list(array=df$err),showlegend=F)

Which gives me:

Pretty close but the only thing that's not working is splitting the points in each cluster by group.

Any idea how to get this to work? Ideally the code would be generic so that any number of group levels are split within each cluster, rather than a code that's specific to A and B of this example.

Solution

I love plotly and use it almost exclusively, but there are some nice features built into ggplot2 that require a handful of tweaks to replicate in plotly.

Still, I think it's worth really getting to know some of the more detailed ins and outs if you plan on publishing interactive plots for others to review. The R API provides a enormous amount of control available to tweak and make every little detail perfect if you use the native syntax instead of ggplotly.

With that said, here's how I would tackle this problem:

(Data generation code provided in question)

library(plotly)    
set.seed(1)
df <- data.frame(cluster=rep(LETTERS[1:10],2),
                 group=c(rep("A",10),
                         rep("B",10)),
                 point=rnorm(20),
                 err=runif(20,0.1,0.3))    
df$group <- factor(df$group,levels=c("A","B"))

First, you need to do some manual "jittering" yourself in a systematic way. I haven't read the source code for the equivalent that does this "auto-magically" in ggplot2 functions, but I imagine something similar to this is taking place behind the curtain.

## Generate a set of offsets based on the number of group
Offset <- data.frame(group = unique(df$group),
                     offset = seq(-0.1, 0.1,length.out = length(unique(df$group))))

## Join the offset to the data frame based on group
df <- merge(df,Offset,by = "group", all.x = TRUE)

## Calculate an x location
df$x_location <- as.numeric(as.factor(df$cluster)) + df$offset

head(df) post-manipulation:

  group cluster      point       err offset x_location
1     A       A -0.6264538 0.2641893   -0.1        0.9
2     A       B  0.1836433 0.2294120   -0.1        1.9
3     A       C -0.8356286 0.2565866   -0.1        2.9
4     A       D  1.5952808 0.2106073   -0.1        3.9
5     A       E  0.3295078 0.2059439   -0.1        4.9
6     A       F -0.8204684 0.2578712   -0.1        5.9

Now that you have an explicit x_location, you can use that on a scatter plot and then add in categorical tick marks/text using an array. Then, by displaying the values of interest in the text, you can eliminate the x and y values from the hoverinfo to fully cover your tracks.

df %>% 
  plot_ly() %>% 
  add_trace(x= ~x_location,y= ~point, color= ~group,
            text = ~paste0("Group ",group," - Cluster ", cluster,"<br>",round(point,2)),
            error_y = list(type = "data", array = ~err), 
            hoverinfo = "text",
            type = "scatter", mode = "markers") %>%
  layout(hovermode = "compare",
         paper_bgcolor = 'rgba(235,235,235,0)',
         plot_bgcolor = "rgba(235,235,235,1)",
         legend=list(orientation="h",
                     xanchor="center",
                     yanchor = "bottom",
                     x=0.5,y=1,
                     bgcolor = "transparent"),
         xaxis=list(title=NA,
                    zeroline=FALSE,
                    tickmode = "array",
                    tickvals = unique(as.numeric(sort(as.factor(df$cluster)))),
                    ticktext = unique(sort(as.factor(df$cluster))),
                    gridcolor = "rgba(255,255,255,1)"),
         yaxis=list(title="Val",
                    zeroline=FALSE,
                    gridcolor = "rgba(255,255,255,1)"))