Search code examples
rggplot2plotfacet-grid

assign multiple geom_text() to corresponding facet_grid()


I followed this SO thread, which did not solve my problem.

Let's say I have a very large dataframe, of which a sample may look like:

> df_test
    time                pT          model Obs Pred               auc            brier
1  3-yrs pT3, pT4a or pT4b          18LNY 0.5  0.7  60.7 [46.9;74.4] 22.5 [17.8;27.2]
2  3-yrs pT3, pT4a or pT4b          18LNY 0.3  0.2  60.7 [46.9;74.4] 22.5 [17.8;27.2]
3  3-yrs        pT1 or pT2 LNY continuous 0.2  0.2  66.4 [55.4;77.4] 15.0 [11.0;19.0]
4  5-yrs        pT1 or pT2 LNY continuous 0.6  0.5  73.2 [63.4;83.0] 17.3 [13.1;21.6]

I am comparing prediction of two models in df_test$model in two different cancer stages df_test$pT at three different timepoints in df_test$time. I am using a facet_grid(pT ~ time), which prints what I expect.

enter image description here

Now, I want to add a geom_text that print df_test$auc for each df$model, i.e. two "auc readouts" in each grid, one for df_test$model == "18LNY" and one for df_test$model == "LNY continuous".

As you can see in the sample above, line 1 and 2 prints the same auc (because it belongs to the same model at the same timepoint). So I have reduced the dataframe to:

label_test <- df_test %>% select(time, pT, model, auc) %>% distinct()

# and
> label_test
    time                pT          model               auc
1  3-yrs pT3, pT4a or pT4b          18LNY  60.7 [46.9;74.4]
2  3-yrs        pT1 or pT2 LNY continuous  66.4 [55.4;77.4]
3  5-yrs        pT1 or pT2 LNY continuous  73.2 [63.4;83.0]
4   1-yr pT3, pT4a or pT4b          18LNY  59.9 [44.6;75.1]
5  5-yrs        pT1 or pT2          18LNY  73.5 [63.7;83.4]
6  3-yrs        pT1 or pT2          18LNY  66.4 [55.3;77.5]
7   1-yr        pT1 or pT2 LNY continuous 89.7 [77.7;100.0]
8  3-yrs pT3, pT4a or pT4b LNY continuous  61.1 [47.3;74.9]
9   1-yr pT3, pT4a or pT4b LNY continuous  59.7 [44.6;74.7]
10 5-yrs pT3, pT4a or pT4b          18LNY  63.5 [48.7;78.4]
11 5-yrs pT3, pT4a or pT4b LNY continuous  63.5 [48.6;78.4]

I then tried

ggplot(df_test, aes(x = Pred, y = Obs, color = model)) + 
  facet_grid(time ~ pT) +
  geom_line() +
  geom_text(data = label_test,
            aes(x = 0, y = c(0.8, 0.9), label = auc), 
            size = 5, hjust = 0, show.legend = FALSE)

Which gives

Error in check_aesthetics(): ! Aesthetics must be either length 1 or the same as the data (11): y

How to solve?

Data

df_test <- structure(list(time = c("3-yrs", "3-yrs", "3-yrs", "5-yrs", "1-yr", 
"5-yrs", "1-yr", "3-yrs", "1-yr", "5-yrs", "1-yr", "3-yrs", "5-yrs", 
"1-yr", "3-yrs", "5-yrs", "3-yrs", "3-yrs", "5-yrs", "5-yrs", 
"3-yrs", "3-yrs", "3-yrs", "5-yrs", "3-yrs"), pT = c("pT3, pT4a or pT4b", 
"pT3, pT4a or pT4b", "pT1 or pT2", "pT1 or pT2", "pT3, pT4a or pT4b", 
"pT1 or pT2", "pT3, pT4a or pT4b", "pT1 or pT2", "pT1 or pT2", 
"pT1 or pT2", "pT1 or pT2", "pT3, pT4a or pT4b", "pT1 or pT2", 
"pT3, pT4a or pT4b", "pT3, pT4a or pT4b", "pT1 or pT2", "pT1 or pT2", 
"pT1 or pT2", "pT3, pT4a or pT4b", "pT3, pT4a or pT4b", "pT3, pT4a or pT4b", 
"pT3, pT4a or pT4b", "pT3, pT4a or pT4b", "pT1 or pT2", "pT1 or pT2"
), model = c("18LNY", "18LNY", "LNY continuous", "LNY continuous", 
"18LNY", "18LNY", "18LNY", "18LNY", "LNY continuous", "18LNY", 
"LNY continuous", "LNY continuous", "LNY continuous", "LNY continuous", 
"LNY continuous", "LNY continuous", "18LNY", "LNY continuous", 
"18LNY", "LNY continuous", "18LNY", "LNY continuous", "LNY continuous", 
"LNY continuous", "LNY continuous"), Obs = c(0.5, 0.3, 0.2, 0.6, 
0.2, 0.3, 0.3, 0.2, 0.4, 0.7, 0.4, 0.5, 0.5, 0.1, 0.5, 0.2, 0.1, 
0.2, 0.6, 0.7, 0.5, 0.3, 0.3, 0.2, 0.6), Pred = c(0.7, 0.2, 0.2, 
0.5, 0.2, 0.3, 0.3, 0.2, 0.2, 0.5, 0.5, 0.7, 0.4, 0.2, 0.5, 0.2, 
0.1, 0.2, 0.6, 0.7, 0.6, 0.2, 0.3, 0, 0.8), auc = c("60.7 [46.9;74.4]", 
"60.7 [46.9;74.4]", "66.4 [55.4;77.4]", "73.2 [63.4;83.0]", "59.9 [44.6;75.1]", 
"73.5 [63.7;83.4]", "59.9 [44.6;75.1]", "66.4 [55.3;77.5]", "89.7 [77.7;100.0]", 
"73.5 [63.7;83.4]", "89.7 [77.7;100.0]", "61.1 [47.3;74.9]", 
"73.2 [63.4;83.0]", "59.7 [44.6;74.7]", "61.1 [47.3;74.9]", "73.2 [63.4;83.0]", 
"66.4 [55.3;77.5]", "66.4 [55.4;77.4]", "63.5 [48.7;78.4]", "63.5 [48.6;78.4]", 
"60.7 [46.9;74.4]", "61.1 [47.3;74.9]", "61.1 [47.3;74.9]", "73.2 [63.4;83.0]", 
"66.4 [55.4;77.4]"), brier = c("22.5 [17.8;27.2]", "22.5 [17.8;27.2]", 
"15.0 [11.0;19.0]", "17.3 [13.1;21.6]", "13.2 [8.7;17.7]", "17.2 [12.9;21.4]", 
"13.2 [8.7;17.7]", "14.9 [10.9;18.9]", "4.5 [2.1;6.8]", "17.2 [12.9;21.4]", 
"4.5 [2.1;6.8]", "22.4 [17.8;27.1]", "17.3 [13.1;21.6]", "13.3 [8.8;17.7]", 
"22.4 [17.8;27.1]", "17.3 [13.1;21.6]", "14.9 [10.9;18.9]", "15.0 [11.0;19.0]", 
"24.0 [18.4;29.6]", "23.9 [18.3;29.4]", "22.5 [17.8;27.2]", "22.4 [17.8;27.1]", 
"22.4 [17.8;27.1]", "17.3 [13.1;21.6]", "15.0 [11.0;19.0]")), class = "data.frame", row.names = c(NA, 
-25L))

Solution

  • One option to fix your issue would be to add the y position for the labels to your labels_df which could then be mapped on the y aes.

    library(dplyr)
    library(ggplot2)
    
    label_test <- df_test %>% 
      select(time, pT, model, auc) %>% 
      distinct() %>% 
      group_by(time, pT) %>% 
      mutate(y = seq(.8, .9, length.out = n())) %>% 
      ungroup()
    
    ggplot(df_test, aes(x = Pred, y = Obs, color = model)) + 
      facet_grid(time ~ pT) +
      geom_line() +
      geom_text(data = label_test,
                aes(x = 0, y = y, label = auc), 
                size = 5, hjust = 0, show.legend = FALSE)
    

    enter image description here