I would like to add the regression line to my correlation scatter plot. Unfortunately this doesn't really work with plot_ly()
. I've already tried some solutions from other posts in this forum, but it doesn't work.
My data frame looks like the following (only a smart part of it):
My code for the plot and the actual plot-output look like the following:
CorrelationPlot <- plot_ly(data = df.dataCorrelation, x = ~df.dataCorrelation$prod1,
y = ~df.dataCorrelation$prod2, type = 'scatter', mode = 'markers',
marker = list(size = 7, color = "#FF9999", line = list(color = "#CC0000", width = 2))) %>%
layout(title = "<b> Correlation Scatter Plot", xaxis = list(title = product1),
yaxis = list(title = product2), showlegend = FALSE)
What I want to have is something like this:
which I have produced with the ggscatter()
function:
library(ggpubr)
ggscatter(df.dataCorrelation, x = "prod1", y = "prod2", color = "#CC0000", shape = 21, size = 2,
add = "reg.line", add.params = list(color = "#CC0000", size = 2), conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson", xlab = product1, ylab = product2)
HOW do I get the regression line with plot_ly()
??
CODE EDITING:
CorrelationPlot <- plot_ly(data = df.dataCorrelation, x = ~df.dataCorrelation$prod1,
y = ~df.dataCorrelation$prod2, type = 'scatter', mode = 'markers',
marker = list(size = 7, color = "#FF9999",
line = list(color = "#CC0000", width = 2))) %>%
add_trace(x = ~df.dataCorrelation$fitted_values, mode = "lines", type = 'scatter',
line = list(color = "black")) %>%
layout(title = "<b> Correlation Scatter Plot", xaxis = list(title = product1),
yaxis = list(title = product2), showlegend = FALSE)
GIVES:
How do I get here a line for the regression line??
I don't think there's a ready function like ggscatter, most likely you have to do it manually, like first fitting the linear model and adding the values to the data.frame.
I made a data.frame that's like your data:
set.seed(111)
df.dataCorrelation = data.frame(prod1=runif(50,20,60))
df.dataCorrelation$prod2 = df.dataCorrelation$prod1 + rnorm(50,10,5)
fit = lm(prod2 ~ prod1,data=df.dataCorrelation)
fitdata = data.frame(prod1=20:60)
prediction = predict(fit,fitdata,se.fit=TRUE)
fitdata$fitted = prediction$fit
The upper and lower bounds of the line are simply 1.96* standard error of prediction:
fitdata$ymin = fitdata$fitted - 1.96*prediction$se.fit
fitdata$ymax = fitdata$fitted + 1.96*prediction$se.fit
We calculate correlation:
COR = cor.test(df.dataCorrelation$prod1,df.dataCorrelation$prod2)[c("estimate","p.value")]
COR_text = paste(c("R=","p="),signif(as.numeric(COR,3),3),collapse=" ")
And put it into plotly:
library(plotly)
df.dataCorrelation %>%
plot_ly(x = ~prod1) %>%
add_markers(x=~prod1, y = ~prod2) %>%
add_trace(data=fitdata,x= ~prod1, y = ~fitted,
mode = "lines",type="scatter",line=list(color="#8d93ab")) %>%
add_ribbons(data=fitdata, ymin = ~ ymin, ymax = ~ ymax,
line=list(color="#F1F3F8E6"),fillcolor ="#F1F3F880" ) %>%
layout(
showlegend = F,
annotations = list(x = 50, y = 50,
text = COR_text,showarrow =FALSE)
)