I followed this SO thread, which did not solve my problem.
Let's say I have a very large dataframe, of which a sample may look like:
> df_test
time pT model Obs Pred auc brier
1 3-yrs pT3, pT4a or pT4b 18LNY 0.5 0.7 60.7 [46.9;74.4] 22.5 [17.8;27.2]
2 3-yrs pT3, pT4a or pT4b 18LNY 0.3 0.2 60.7 [46.9;74.4] 22.5 [17.8;27.2]
3 3-yrs pT1 or pT2 LNY continuous 0.2 0.2 66.4 [55.4;77.4] 15.0 [11.0;19.0]
4 5-yrs pT1 or pT2 LNY continuous 0.6 0.5 73.2 [63.4;83.0] 17.3 [13.1;21.6]
I am comparing prediction of two models in df_test$model
in two different cancer stages df_test$pT
at three different timepoints in df_test$time
. I am using a facet_grid(pT ~ time)
, which prints what I expect.
Now, I want to add a geom_text
that print df_test$auc
for each df$model
, i.e. two "auc readouts" in each grid, one for df_test$model == "18LNY"
and one for df_test$model == "LNY continuous"
.
As you can see in the sample above, line 1 and 2 prints the same auc (because it belongs to the same model at the same timepoint). So I have reduced the dataframe to:
label_test <- df_test %>% select(time, pT, model, auc) %>% distinct()
# and
> label_test
time pT model auc
1 3-yrs pT3, pT4a or pT4b 18LNY 60.7 [46.9;74.4]
2 3-yrs pT1 or pT2 LNY continuous 66.4 [55.4;77.4]
3 5-yrs pT1 or pT2 LNY continuous 73.2 [63.4;83.0]
4 1-yr pT3, pT4a or pT4b 18LNY 59.9 [44.6;75.1]
5 5-yrs pT1 or pT2 18LNY 73.5 [63.7;83.4]
6 3-yrs pT1 or pT2 18LNY 66.4 [55.3;77.5]
7 1-yr pT1 or pT2 LNY continuous 89.7 [77.7;100.0]
8 3-yrs pT3, pT4a or pT4b LNY continuous 61.1 [47.3;74.9]
9 1-yr pT3, pT4a or pT4b LNY continuous 59.7 [44.6;74.7]
10 5-yrs pT3, pT4a or pT4b 18LNY 63.5 [48.7;78.4]
11 5-yrs pT3, pT4a or pT4b LNY continuous 63.5 [48.6;78.4]
I then tried
ggplot(df_test, aes(x = Pred, y = Obs, color = model)) +
facet_grid(time ~ pT) +
geom_line() +
geom_text(data = label_test,
aes(x = 0, y = c(0.8, 0.9), label = auc),
size = 5, hjust = 0, show.legend = FALSE)
Which gives
Error in
check_aesthetics()
: ! Aesthetics must be either length 1 or the same as the data (11): y
How to solve?
Data
df_test <- structure(list(time = c("3-yrs", "3-yrs", "3-yrs", "5-yrs", "1-yr",
"5-yrs", "1-yr", "3-yrs", "1-yr", "5-yrs", "1-yr", "3-yrs", "5-yrs",
"1-yr", "3-yrs", "5-yrs", "3-yrs", "3-yrs", "5-yrs", "5-yrs",
"3-yrs", "3-yrs", "3-yrs", "5-yrs", "3-yrs"), pT = c("pT3, pT4a or pT4b",
"pT3, pT4a or pT4b", "pT1 or pT2", "pT1 or pT2", "pT3, pT4a or pT4b",
"pT1 or pT2", "pT3, pT4a or pT4b", "pT1 or pT2", "pT1 or pT2",
"pT1 or pT2", "pT1 or pT2", "pT3, pT4a or pT4b", "pT1 or pT2",
"pT3, pT4a or pT4b", "pT3, pT4a or pT4b", "pT1 or pT2", "pT1 or pT2",
"pT1 or pT2", "pT3, pT4a or pT4b", "pT3, pT4a or pT4b", "pT3, pT4a or pT4b",
"pT3, pT4a or pT4b", "pT3, pT4a or pT4b", "pT1 or pT2", "pT1 or pT2"
), model = c("18LNY", "18LNY", "LNY continuous", "LNY continuous",
"18LNY", "18LNY", "18LNY", "18LNY", "LNY continuous", "18LNY",
"LNY continuous", "LNY continuous", "LNY continuous", "LNY continuous",
"LNY continuous", "LNY continuous", "18LNY", "LNY continuous",
"18LNY", "LNY continuous", "18LNY", "LNY continuous", "LNY continuous",
"LNY continuous", "LNY continuous"), Obs = c(0.5, 0.3, 0.2, 0.6,
0.2, 0.3, 0.3, 0.2, 0.4, 0.7, 0.4, 0.5, 0.5, 0.1, 0.5, 0.2, 0.1,
0.2, 0.6, 0.7, 0.5, 0.3, 0.3, 0.2, 0.6), Pred = c(0.7, 0.2, 0.2,
0.5, 0.2, 0.3, 0.3, 0.2, 0.2, 0.5, 0.5, 0.7, 0.4, 0.2, 0.5, 0.2,
0.1, 0.2, 0.6, 0.7, 0.6, 0.2, 0.3, 0, 0.8), auc = c("60.7 [46.9;74.4]",
"60.7 [46.9;74.4]", "66.4 [55.4;77.4]", "73.2 [63.4;83.0]", "59.9 [44.6;75.1]",
"73.5 [63.7;83.4]", "59.9 [44.6;75.1]", "66.4 [55.3;77.5]", "89.7 [77.7;100.0]",
"73.5 [63.7;83.4]", "89.7 [77.7;100.0]", "61.1 [47.3;74.9]",
"73.2 [63.4;83.0]", "59.7 [44.6;74.7]", "61.1 [47.3;74.9]", "73.2 [63.4;83.0]",
"66.4 [55.3;77.5]", "66.4 [55.4;77.4]", "63.5 [48.7;78.4]", "63.5 [48.6;78.4]",
"60.7 [46.9;74.4]", "61.1 [47.3;74.9]", "61.1 [47.3;74.9]", "73.2 [63.4;83.0]",
"66.4 [55.4;77.4]"), brier = c("22.5 [17.8;27.2]", "22.5 [17.8;27.2]",
"15.0 [11.0;19.0]", "17.3 [13.1;21.6]", "13.2 [8.7;17.7]", "17.2 [12.9;21.4]",
"13.2 [8.7;17.7]", "14.9 [10.9;18.9]", "4.5 [2.1;6.8]", "17.2 [12.9;21.4]",
"4.5 [2.1;6.8]", "22.4 [17.8;27.1]", "17.3 [13.1;21.6]", "13.3 [8.8;17.7]",
"22.4 [17.8;27.1]", "17.3 [13.1;21.6]", "14.9 [10.9;18.9]", "15.0 [11.0;19.0]",
"24.0 [18.4;29.6]", "23.9 [18.3;29.4]", "22.5 [17.8;27.2]", "22.4 [17.8;27.1]",
"22.4 [17.8;27.1]", "17.3 [13.1;21.6]", "15.0 [11.0;19.0]")), class = "data.frame", row.names = c(NA,
-25L))
One option to fix your issue would be to add the y
position for the labels to your labels_df
which could then be mapped on the y
aes.
library(dplyr)
library(ggplot2)
label_test <- df_test %>%
select(time, pT, model, auc) %>%
distinct() %>%
group_by(time, pT) %>%
mutate(y = seq(.8, .9, length.out = n())) %>%
ungroup()
ggplot(df_test, aes(x = Pred, y = Obs, color = model)) +
facet_grid(time ~ pT) +
geom_line() +
geom_text(data = label_test,
aes(x = 0, y = y, label = auc),
size = 5, hjust = 0, show.legend = FALSE)