I have a biomarker for which I would like to model the relative variations (var
, %) of concentrations (in g/L) over 7 successive times ('Ct0
' to 'Ct6
'), using geom_line
and geom_point
.
The basic data are these:
> dat0
# A tibble: 10 × 8
id Ct0 Ct1 Ct2 Ct3 Ct4 Ct5 Ct6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 9 6 NA NA NA NA NA
2 2 6 8 12 NA NA NA NA
3 3 19 21 38 18 16 6 14
4 4 36 27 25 12 13 21 24
5 5 11 17 NA NA NA NA NA
6 6 29 14 16 7 NA NA NA
7 7 4 12 18 35 31 NA NA
8 8 32 30 29 NA NA NA NA
9 9 39 37 35 40 37 39 NA
10 10 2 3 21 22 NA NA NA
The criteria to take into account are these:
var
are to be calculated relative to the previous Ct, not relative to the initial concentration Ct0 (except for the first variation Ct1 vs Ct0 of course),var
: the geom_line
segment and the 2 geom_point
are green,var
is >=50% down or up, the geom_line
segment and the 2 geom_point
are red,var
is >=20% down or up, the geom_line
segment and the 2 geom_point
are red,geom_points
, red takes priority over green (i.e. if the previous var
is green and the next var
is red, or if the previous var
is red and the next var
is green, the geom_point
at the intersection of the 2 geom_line
segments is red in both cases),id
cases with at least one red var
over the period are shown in front of the graph (i.e. not hidden by the green geom_line
at the intersection points), and for geom_line
segments or geom_points
of distinct cases that may overlap, red always takes priority over green for both.Here is my approach:
First, pivot_longer dat0
:
dat1 <- dat0 |>
pivot_longer(
cols = c(2:8),
names_to = "time",
names_prefix = "Ct",
values_to = "conc",
values_drop_na = TRUE
)
dat1$id <- as.factor(dat1$id)
dat1$time <- as.factor(dat1$time)
Second, mutate var
:
dat1 <- dat1 |>
arrange(id, time) |>
group_by(id) |>
mutate(var = 100 * (conc - lag(conc)) / lag(conc)) |>
ungroup()
dat1$var <- round(dat1$var, 1)
Third, mutate grcol
(green=0 or red=1 color):
dat1 <- dat1 |>
arrange(id, time) |>
group_by(id) |>
mutate(
grcol =
case_when(
!is.na(var) & lag(conc) <= 15 & conc <= 15 ~ 0,
!is.na(var) & lag(conc) <= 15 & conc > 15 & abs(var) < 50 ~ 0,
!is.na(var) & lag(conc) <= 15 & conc > 15 & abs(var) >= 50 ~ 1,
!is.na(var) & lag(conc) > 15 & conc <= 15 & abs(var) < 50 ~ 0,
!is.na(var) & lag(conc) > 15 & conc <= 15 & abs(var) >= 50 ~ 1,
!is.na(var) & lag(conc) > 15 & conc > 15 & abs(var) < 20 ~ 0,
!is.na(var) & lag(conc) > 15 & conc > 15 & abs(var) >= 20 ~ 1,
TRUE ~ NA)) |>
ungroup()
dat1$grcol <- as.factor(dat1$grcol)
Then plot:
grcol_color <- c("green", "red")
ggplot(dat1, aes(time, conc, group = id, colour = lead(grcol))) +
scale_colour_manual(values = grcol_color) +
geom_hline(yintercept = 15) + # cut-off
geom_line(linewidth = 1.5) +
geom_point(size = 4)
Note that because of the var
calculation based on lag
in dat1
, I use lead
for the colour
argument, otherwise the first segments are gray (because of the NAs at Ct0 in dat1
) and the following ones do not have the expected colors (because they are shifted).
The graph is close to the goal; however, three problems persist:
colour = lead(grcol)
a priori) whereas they should be the same color as the last segment,How to solve these problems? Is the pivot_longer approach the most appropriate or is there a simpler way?
Thanks for help or advice, and for time
Data:
dat0 <-
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Ct0 = c(9,
6, 19, 36, 11, 29, 4, 32, 39, 2), Ct1 = c(6, 8, 21, 27, 17, 14,
12, 30, 37, 3), Ct2 = c(NA, 12, 38, 25, NA, 16, 18, 29, 35, 21
), Ct3 = c(NA, NA, 18, 12, NA, 7, 35, NA, 40, 22), Ct4 = c(NA,
NA, 16, 13, NA, NA, 31, NA, 37, NA), Ct5 = c(NA, NA, 6, 21, NA,
NA, NA, NA, 39, NA), Ct6 = c(NA, NA, 14, 24, NA, NA, NA, NA,
NA, NA)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-10L))
I would focus here on reshaping your data frame such that each row represents one of the segments to be plotted. It is then easy to apply your rules to that segment. Draw two layers of points - one for the start points and one for the end points. Finally, draw those two points layers again but with only the "red" rows.
dat0 %>%
pivot_longer(-1, names_pattern = "Ct(.*)", names_to = "start",
values_to = "start_value") %>%
mutate(start = as.numeric(start)) %>%
mutate(end = lead(start, 1), end_value = lead(start_value, 1), .by = "id") %>%
filter(complete.cases(.)) %>%
mutate(change = abs(1 - (end_value / start_value) )) %>%
mutate(either = start_value > 15 | end_value > 15,
both = start_value > 15 & end_value > 15) %>%
mutate(color = ifelse((either & (change >= 0.5)) |
(both & (change >= 0.2)), "red", "green3")) %>%
ggplot(aes(x = start, y = start_value, color = color)) +
geom_segment(aes(xend = end, yend = end_value), linewidth = 1.5) +
geom_point(size = 4) +
geom_point(aes(x = end, y = end_value), size = 4) +
geom_point(data = . %>% filter(color == "red"), size = 4) +
geom_point(aes(x = end, y = end_value), data = . %>% filter(color == "red"),
size = 4) +
scale_color_identity() +
labs(x = "time", y = "conc") +
theme_minimal(20)