Search code examples
rggplot2facetsapplygeom-text

Error using sapply and split to have different pvalues and r^2's on a facet wrap ggplot


I'm attempting to have different pvalues and r^2's show up on a plot I'm making using ggplot. My plot needs to be faceted, as I have many different factors of data I'm working on. The graphs I'm trying to make should all be linear models, but I'd like each to have it's own pvalue and r^2 show up in it's respective space.

I've been trying to use sapply to split the dataframe up and then calculate the r^2's and pvalues and then plug them back into the plot using geom_text(label = examplefunction), but I keep receiving the error "Error: Aesthetics must be either length 1 or the same as the data (244): x, y, label, hjust, vjust".

Here's an example using the "tips" dataframe from the reshape package:

library(reshape)

lm_equation <- function(tips){
  sapply(split(tips, list(tips$sex, tips$day)), function(tips){
    m <- lm(tips$tip ~ tips$total_bill, tips);
    eq <- substitute(~~italic(r)^2~"="~rvalue*","~italic(p)~"="~pvalue, 
                     list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)),
                          pvalue = format(summary(m)$coefficients[2,4], digits = 2)))
    as.character(as.expression(eq));
  })
}

scat <- ggplot(tips, aes(tip, total_bill))
scat +
  geom_point(size = 5, alpha = 0.9)+
  labs(x = "tip", y = "bill total")+
  geom_smooth(method=lm, colour = "#000000", se = F)+
  facet_grid(sex~day, scales = "free")+
  geom_text(x = min(tips$tip), y = max(tips$total_bill-10), label = lm_equation(tips), parse = T, vjust = "inward", hjust = "inward")+
  theme_classic() + 
  theme(text = element_text(size = 15))

What's frustrating is the code works if I take out the split, but then the pvalues and r^2s are meaningless since they are taken from the entire dataframe rather than just that specific faceted group.

Example of working code:

lm_equation2 <- function(tips){
    m <- lm(tips$tip ~ tips$total_bill, tips);
    eq <- substitute(~~italic(r)^2~"="~rvalue*","~italic(p)~"="~pvalue, 
                     list(rvalue = sprintf("%.2f",sign(coef(m)[2])*sqrt(summary(m)$r.squared)),
                          pvalue = format(summary(m)$coefficients[2,4], digits = 2)))
    as.character(as.expression(eq));
  }

scat2 <- ggplot(tips, aes(tip, total_bill))
scat2 +
  geom_point(size = 5, alpha = 0.9)+
  labs(x = "tip", y = "bill total")+
  geom_smooth(method=lm, colour = "#000000", se = F)+
  facet_grid(sex~day, scales = "free")+
  geom_text(x = min(tips$tip), y = max(tips$total_bill-10), label = lm_equation2(tips), parse = T, vjust = "inward", hjust = "inward")+
  theme_classic() + 
  theme(text = element_text(size = 15))

What am I missing here? Do I need to resort to subsetting my data?


Solution

  • Here is an example of taking what you have and organizing the results into a data.frame that contains all the variables you need for plotting. In particular the faceting variables must be present in the dataset.

    First you can put the labels and names of each group (combo of sex and day) into a data.frame as columns. You will want to add a columns for the position each of each equation, using the names of the original x and y variables.

    lab_dat = data.frame(group = names(lm_equation(tips)),
                         tip = min(tips$tip),
                         total_bill = max(tips$total_bill - 10),
                         label = lm_equation(tips))
    lab_dat
    
                      group tip total_bill                                                            label
    Female.Fri   Female.Fri   1      40.81   ~~italic(r)^2 ~ "=" ~ "0.72" * "," ~ italic(p) ~ "=" ~ "0.029"
    Male.Fri       Male.Fri   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.92" * "," ~ italic(p) ~ "=" ~ "0.00017"
    Female.Sat   Female.Sat   1      40.81  ~~italic(r)^2 ~ "=" ~ "0.50" * "," ~ italic(p) ~ "=" ~ "0.0071"
    Male.Sat       Male.Sat   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.77" * "," ~ italic(p) ~ "=" ~ "1.4e-12"
    Female.Sun   Female.Sun   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.74" * "," ~ italic(p) ~ "=" ~ "0.00041"
    Male.Sun       Male.Sun   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.46" * "," ~ italic(p) ~ "=" ~ "0.00032"
    Female.Thur Female.Thur   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.87" * "," ~ italic(p) ~ "=" ~ "9.4e-11"
    Male.Thur     Male.Thur   1      40.81   ~~italic(r)^2 ~ "=" ~ "0.76" * "," ~ italic(p) ~ "=" ~ "1e-06"
    

    Then you need to take the group variable, which has combined sex and day, and split it back into two separate variables. I use separate() from package tidyr for this. The new variables should be named the same as the variables in the original dataset, since these are the faceting variables and need to be present in a dataset used for any plotting layer.

    library(tidyr)
    lab_dat = separate(lab_dat, group, c("sex", "day"))
    lab_dat
    
                   sex  day tip total_bill                                                            label
    Female.Fri  Female  Fri   1      40.81   ~~italic(r)^2 ~ "=" ~ "0.72" * "," ~ italic(p) ~ "=" ~ "0.029"
    Male.Fri      Male  Fri   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.92" * "," ~ italic(p) ~ "=" ~ "0.00017"
    Female.Sat  Female  Sat   1      40.81  ~~italic(r)^2 ~ "=" ~ "0.50" * "," ~ italic(p) ~ "=" ~ "0.0071"
    Male.Sat      Male  Sat   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.77" * "," ~ italic(p) ~ "=" ~ "1.4e-12"
    Female.Sun  Female  Sun   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.74" * "," ~ italic(p) ~ "=" ~ "0.00041"
    Male.Sun      Male  Sun   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.46" * "," ~ italic(p) ~ "=" ~ "0.00032"
    Female.Thur Female Thur   1      40.81 ~~italic(r)^2 ~ "=" ~ "0.87" * "," ~ italic(p) ~ "=" ~ "9.4e-11"
    Male.Thur     Male Thur   1      40.81   ~~italic(r)^2 ~ "=" ~ "0.76" * "," ~ italic(p) ~ "=" ~ "1e-06"
    

    Now you can plot one label per facet, using lab_dat for the geom_text() layer.

    ggplot(tips, aes(tip, total_bill)) +
         geom_point(size = 5, alpha = 0.9)+
         geom_smooth(method=lm, colour = "#000000", se = FALSE)+
         facet_grid(sex ~ day, scales = "free")+
         geom_text(data = lab_dat, aes(label = label), parse = TRUE, 
                   vjust = "inward", hjust = "inward")