Probably an easy one.
I have data points (with error bars) that I'd like to plot.
There are two levels of grouping factors: group
and cluster
:
set.seed(1)
df <- data.frame(cluster=rep(LETTERS[1:10],2),group=c(rep("A",10),rep("B",10)),point=rnorm(20),err=runif(20,0.1,0.3))
df$group <- factor(df$group,levels=c("A","B"))
I'd like to plot the points where the x-axis
is df$cluster
, and within each cluster
the points are color coded by df$group
and split (so that group
A
point is left to group
B
point).
Here's what I'm trying:
library(plotly)
plot_ly(x=~df$cluster,y=~df$point,split=~df$group,type='scatter',mode="markers",showlegend=T,color=~df$group) %>%
layout(legend=list(orientation="h",xanchor="center",x=0.5,y=1),xaxis=list(title=NA,zeroline=F,categoryorder="array",categoryarray=sort(unique(df$cluster)),showticklabels=T),yaxis=list(title="Val",zeroline=F)) %>%
plotly::add_trace(error_y=list(array=df$err),showlegend=F)
Pretty close but the only thing that's not working is splitting the points in each cluster
by group
.
Any idea how to get this to work? Ideally the code would be generic so that any number of group levels are split within each cluster
, rather than a code that's specific to A
and B
of this example.
I love plotly
and use it almost exclusively, but there are some nice features built into ggplot2
that require a handful of tweaks to replicate in plotly
.
Still, I think it's worth really getting to know some of the more detailed ins and outs if you plan on publishing interactive plots for others to review. The R
API provides a enormous amount of control available to tweak and make every little detail perfect if you use the native syntax instead of ggplotly
.
With that said, here's how I would tackle this problem:
(Data generation code provided in question)
library(plotly)
set.seed(1)
df <- data.frame(cluster=rep(LETTERS[1:10],2),
group=c(rep("A",10),
rep("B",10)),
point=rnorm(20),
err=runif(20,0.1,0.3))
df$group <- factor(df$group,levels=c("A","B"))
First, you need to do some manual "jittering" yourself in a systematic way. I haven't read the source code for the equivalent that does this "auto-magically" in ggplot2
functions, but I imagine something similar to this is taking place behind the curtain.
## Generate a set of offsets based on the number of group
Offset <- data.frame(group = unique(df$group),
offset = seq(-0.1, 0.1,length.out = length(unique(df$group))))
## Join the offset to the data frame based on group
df <- merge(df,Offset,by = "group", all.x = TRUE)
## Calculate an x location
df$x_location <- as.numeric(as.factor(df$cluster)) + df$offset
head(df)
post-manipulation:
group cluster point err offset x_location
1 A A -0.6264538 0.2641893 -0.1 0.9
2 A B 0.1836433 0.2294120 -0.1 1.9
3 A C -0.8356286 0.2565866 -0.1 2.9
4 A D 1.5952808 0.2106073 -0.1 3.9
5 A E 0.3295078 0.2059439 -0.1 4.9
6 A F -0.8204684 0.2578712 -0.1 5.9
Now that you have an explicit x_location, you can use that on a scatter plot and then add in categorical tick marks/text using an array. Then, by displaying the values of interest in the text
, you can eliminate the x
and y
values from the hoverinfo
to fully cover your tracks.
df %>%
plot_ly() %>%
add_trace(x= ~x_location,y= ~point, color= ~group,
text = ~paste0("Group ",group," - Cluster ", cluster,"<br>",round(point,2)),
error_y = list(type = "data", array = ~err),
hoverinfo = "text",
type = "scatter", mode = "markers") %>%
layout(hovermode = "compare",
paper_bgcolor = 'rgba(235,235,235,0)',
plot_bgcolor = "rgba(235,235,235,1)",
legend=list(orientation="h",
xanchor="center",
yanchor = "bottom",
x=0.5,y=1,
bgcolor = "transparent"),
xaxis=list(title=NA,
zeroline=FALSE,
tickmode = "array",
tickvals = unique(as.numeric(sort(as.factor(df$cluster)))),
ticktext = unique(sort(as.factor(df$cluster))),
gridcolor = "rgba(255,255,255,1)"),
yaxis=list(title="Val",
zeroline=FALSE,
gridcolor = "rgba(255,255,255,1)"))