I have a dataset that looks like below
df <- data.frame(
id = 1:6, # New id column
drinks_on_previous_day1 = c(5,4,2,1,5,4),
drinks_on_previous_day2 = c(5,4,2,1,5,5),
drinks_on_previous_day3 = c(5,4,2,1,5,6),
drinks_on_previous_day4 = c(5,4,2,1,5,7),
drinks_on_previous_day5 = c(5,4,2,1,5,8),
completion_date = as.Date(c("2023-04-18 16:32:17", "2023-04-21 16:32:17", "2023-04-24 16:32:17", "2023-04-25 16:32:17", "2023-05-01 16:32:17", "2023-05-02 16:32:17"))
)
I have the drinks questions indicating the previous days, but I don't have exact dates. These would be different for each participant.
I have their completion date.
For each participant, First I'd like to calculate the dates based on their completion date. I am completely lost as to how to do that for each participant. Finally, I would like the data to be in a long format.
Here is what the dataframe should look like:
df_long <- df %>%
pivot_longer(
cols = starts_with("drinks_on_previous_day"), # Select columns to pivot
names_to = "day", # Name of the new column for day values
values_to = "drinks" # Name of the new column for drinks values
)
But instead of column names in the day column, I'd like to be the exact 5 dates for each participant based on their completion date.
Finally, I would like to plot a heatmap with the drinks on each day with one line for each participant with different colors for different number of drinks.
>5 drinks -> red
<5 drinks -> green
I am not quite sure about the plot, but we can add the dates to the data:
library(tidyverse)
df %>%
pivot_longer(
cols = starts_with("drinks_on_previous_day"), # Select columns to pivot
names_to = "day", # Name of the new column for day values
values_to = "drinks" # Name of the new column for drinks values
) %>%
mutate(day = as.numeric(gsub("drinks_on_previous_day", "", day))) %>%
na.omit() %>%
mutate(drink_date = completion_date + day - max(day),
drink_threshold = if_else(drinks >= 5, ">= 5", "< 5"),
.by = id) -> df_long
#> # A tibble: 30 × 6
#> id completion_date day drinks drink_date drink_threshold
#> <int> <date> <dbl> <dbl> <date> <chr>
#> 1 1 2023-04-18 1 5 2023-04-14 >= 5
#> 2 1 2023-04-18 2 5 2023-04-15 >= 5
#> 3 1 2023-04-18 3 5 2023-04-16 >= 5
#> 4 1 2023-04-18 4 5 2023-04-17 >= 5
#> 5 1 2023-04-18 5 5 2023-04-18 >= 5
#> 6 2 2023-04-21 1 4 2023-04-17 < 5
#> 7 2 2023-04-21 2 4 2023-04-18 < 5
#> 8 2 2023-04-21 3 4 2023-04-19 < 5
#> 9 2 2023-04-21 4 4 2023-04-20 < 5
#> 10 2 2023-04-21 5 4 2023-04-21 < 5
#> # ℹ 20 more rows
I created a heatmap-like plot using ggplot2:
ggplot(df_long) +
geom_bar(aes(drink_date, id, fill = drink_threshold),
stat = "identity", color = "black", width=1) +
facet_wrap(~id, ncol = 1, strip.position="left", scales = "free_y") +
scale_x_date(date_breaks = "1 day", expand =c(0.01,0.01)) +
scale_fill_manual(values=c("springgreen2", "firebrick2")) +
theme_gray() +
theme(axis.text.y = element_blank(),
axis.title.y = element_text(angle = 0, vjust = 0.5),
line = element_blank(),
axis.text.x = element_text(angle = 90, vjust = 0.5),
panel.spacing = unit(0, "lines"),
strip.text.y.left = element_text(angle = 0),
strip.background = element_rect(colour="white", fill="white"))