I'm attempting to have different pvalues and r^2's show up on a plot I'm making using ggplot. My plot needs to be faceted, as I have many different factors of data I'm working on. The graphs I'm trying to make should all be linear models, but I'd like each to have it's own pvalue and r^2 show up in it's respective space.
I've been trying to use sapply to split the dataframe up and then calculate the r^2's and pvalues and then plug them back into the plot using geom_text(label = examplefunction), but I keep receiving the error "Error: Aesthetics must be either length 1 or the same as the data (244): x, y, label, hjust, vjust".
Here's an example using the "tips" dataframe from the reshape package:
library(reshape)
lm_equation <- function(tips){
sapply(split(tips, list(tips$sex, tips$day)), function(tips){
m <- lm(tips$tip ~ tips$total_bill, tips);
eq <- substitute(~~italic(r)^2~"="~rvalue*","~italic(p)~"="~pvalue,
list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)),
pvalue = format(summary(m)$coefficients[2,4], digits = 2)))
as.character(as.expression(eq));
})
}
scat <- ggplot(tips, aes(tip, total_bill))
scat +
geom_point(size = 5, alpha = 0.9)+
labs(x = "tip", y = "bill total")+
geom_smooth(method=lm, colour = "#000000", se = F)+
facet_grid(sex~day, scales = "free")+
geom_text(x = min(tips$tip), y = max(tips$total_bill-10), label = lm_equation(tips), parse = T, vjust = "inward", hjust = "inward")+
theme_classic() +
theme(text = element_text(size = 15))
What's frustrating is the code works if I take out the split, but then the pvalues and r^2s are meaningless since they are taken from the entire dataframe rather than just that specific faceted group.
Example of working code:
lm_equation2 <- function(tips){
m <- lm(tips$tip ~ tips$total_bill, tips);
eq <- substitute(~~italic(r)^2~"="~rvalue*","~italic(p)~"="~pvalue,
list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)),
pvalue = format(summary(m)$coefficients[2,4], digits = 2)))
as.character(as.expression(eq));
}
scat2 <- ggplot(tips, aes(tip, total_bill))
scat2 +
geom_point(size = 5, alpha = 0.9)+
labs(x = "tip", y = "bill total")+
geom_smooth(method=lm, colour = "#000000", se = F)+
facet_grid(sex~day, scales = "free")+
geom_text(x = min(tips$tip), y = max(tips$total_bill-10), label = lm_equation2(tips), parse = T, vjust = "inward", hjust = "inward")+
theme_classic() +
theme(text = element_text(size = 15))
What am I missing here? Do I need to resort to subsetting my data?
Here is an example of taking what you have and organizing the results into a data.frame that contains all the variables you need for plotting. In particular the faceting variables must be present in the dataset.
First you can put the labels and names of each group (combo of sex
and day
) into a data.frame as columns. You will want to add a columns for the position each of each equation, using the names of the original x
and y
variables.
lab_dat = data.frame(group = names(lm_equation(tips)),
tip = min(tips$tip),
total_bill = max(tips$total_bill - 10),
label = lm_equation(tips))
lab_dat
group tip total_bill label
Female.Fri Female.Fri 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.72" * "," ~ italic(p) ~ "=" ~ "0.029"
Male.Fri Male.Fri 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.92" * "," ~ italic(p) ~ "=" ~ "0.00017"
Female.Sat Female.Sat 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.50" * "," ~ italic(p) ~ "=" ~ "0.0071"
Male.Sat Male.Sat 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.77" * "," ~ italic(p) ~ "=" ~ "1.4e-12"
Female.Sun Female.Sun 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.74" * "," ~ italic(p) ~ "=" ~ "0.00041"
Male.Sun Male.Sun 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.46" * "," ~ italic(p) ~ "=" ~ "0.00032"
Female.Thur Female.Thur 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.87" * "," ~ italic(p) ~ "=" ~ "9.4e-11"
Male.Thur Male.Thur 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.76" * "," ~ italic(p) ~ "=" ~ "1e-06"
Then you need to take the group
variable, which has combined sex
and day
, and split it back into two separate variables. I use separate()
from package tidyr for this. The new variables should be named the same as the variables in the original dataset, since these are the faceting variables and need to be present in a dataset used for any plotting layer.
library(tidyr)
lab_dat = separate(lab_dat, group, c("sex", "day"))
lab_dat
sex day tip total_bill label
Female.Fri Female Fri 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.72" * "," ~ italic(p) ~ "=" ~ "0.029"
Male.Fri Male Fri 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.92" * "," ~ italic(p) ~ "=" ~ "0.00017"
Female.Sat Female Sat 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.50" * "," ~ italic(p) ~ "=" ~ "0.0071"
Male.Sat Male Sat 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.77" * "," ~ italic(p) ~ "=" ~ "1.4e-12"
Female.Sun Female Sun 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.74" * "," ~ italic(p) ~ "=" ~ "0.00041"
Male.Sun Male Sun 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.46" * "," ~ italic(p) ~ "=" ~ "0.00032"
Female.Thur Female Thur 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.87" * "," ~ italic(p) ~ "=" ~ "9.4e-11"
Male.Thur Male Thur 1 40.81 ~~italic(r)^2 ~ "=" ~ "0.76" * "," ~ italic(p) ~ "=" ~ "1e-06"
Now you can plot one label per facet, using lab_dat
for the geom_text()
layer.
ggplot(tips, aes(tip, total_bill)) +
geom_point(size = 5, alpha = 0.9)+
geom_smooth(method=lm, colour = "#000000", se = FALSE)+
facet_grid(sex ~ day, scales = "free")+
geom_text(data = lab_dat, aes(label = label), parse = TRUE,
vjust = "inward", hjust = "inward")