The figure below is a conceptual diagram used by Michael Clark, https://m-clark.github.io/docs/lord/index.html to explain Lord's Paradox and related phenomena in regression.
My question is framed in this context and using ggplot2
but it is broader in terms of geometry & graphing.
I would like to reproduce figures like this, but using actual data. I need to know:
y-x
y-x
projected onto this axis.My minimal base example uses ggplot2
,
library(ggplot2)
set.seed(1234)
N <- 200
group <- rep(c(0, 1), each = N/2)
initial <- .75*group + rnorm(N, sd=.25)
final <- .4*initial + .5*group + rnorm(N, sd=.1)
change <- final - initial
df <- data.frame(id = factor(1:N),
group = factor(group,
labels = c('Female', 'Male')),
initial,
final,
change)
#head(df)
#' plot, with regression lines and data ellipses
ggplot(df, aes(x = initial, y = final, color = group)) +
geom_point() +
geom_smooth(method = "lm", formula = y~x) +
stat_ellipse(size = 1.2) +
geom_abline(slope = 1, color = "black", size = 1.2) +
coord_fixed(xlim = c(-.6, 1.2), ylim = c(-.6, 1.2)) +
theme_bw() +
theme(legend.position = c(.15, .85))
This gives the following graph:
In geometry, the coordinates of the -45 degree rotated axes of distributions I want to portray are
(y-x), (x+y) in the original space of the plot. But how can I draw these with
ggplot2
or other software?
An accepted solution can be vague about how the distribution of (y-x) is represented, but should solve the problem of how to display this on a (y-x) axis.
Fun question! I haven't encountered it yet, but there might be a package to help do this automatically. Here's a manual approach using two hacks:
clip = "off"
parameter of the coord_*
functions, to allow us to add annotations outside the plot area.First, we can make a density plot of the change from initial to final, seeing a left skewed distribution:
(my_hist <- df %>%
mutate(gain = final - initial) %>% # gain would be better name
ggplot(aes(gain)) +
geom_density())
Now we can extract the guts of that plot, and transform the coordinates to where we want them to appear in the combined plot:
a <- ggplot_build(my_hist)
rot = pi * 3/4
diag_hist <- tibble(
x = a[["data"]][[1]][["x"]],
y = a[["data"]][[1]][["y"]]
) %>%
# squish
mutate(y = y*0.2) %>%
# rotate 135 deg CCW
mutate(xy = x*cos(rot) - y*sin(rot),
dens = x*sin(rot) + y*cos(rot)) %>%
# slide
mutate(xy = xy - 0.7, # magic number based on plot range below
dens = dens - 0.7)
And here's a combination with the original plot:
ggplot(df, aes(x = initial, y = final, color = group)) +
geom_point() +
geom_smooth(method = "lm", formula = y~x) +
stat_ellipse(size = 1.2) +
geom_abline(slope = 1, color = "black", size = 1.2) +
coord_fixed(clip = "off",
xlim = c(-0.7,1.6),
ylim = c(-0.7,1.6),
expand = expansion(0)) +
annotate("segment", x = -1.4, xend = 0, y = 0, yend = -1.4) +
annotate("path", x = diag_hist$xy, y = diag_hist$dens) +
theme_bw() +
theme(legend.position = c(.15, .85),
plot.margin = unit(c(.1,.1,2,2), "cm"))