I have a dataset (made a fake one here), that I need to make a graph out of. The issue is that there are multiple IDs, and each ID has multiple x and y values at different time points. I want to be able to plot all these in the same graph, however, there are more than 400 IDs, each with two variables. I have made an example code to show a similar structure
ID <- c(1, 2, 3)
age1 <- c(10, 15, 20)
age2 <- c(11, 16, 21)
age3 <- c(12, 17, 22)
weight1 <- c(30, 50, 60)
weight2 <- c(35, 55, 65)
weight3 <- c(38, 56, 67)
df <- data.frame(ID, age1, age2, age3, weight1, weight2, weight3)
I want it to be like this
I cant simply delete duplicates as many numbers are duplicates
I have tried
df_new <- df %>%
gather(age,
age_value,
age1:age3
)%>%
gather(weight,
weight_value,
weight1:weight3
)
However this gives me multiple duplicates of Y.
I am currently at a loss on how to solve this issue.
Seems like you want this:
tidyr::pivot_longer(df, -ID, names_to = c(".value", "time"), names_pattern = "(\\w+)(\\d+)")
Output:
# A tibble: 9 × 4
ID time age weight
<dbl> <chr> <dbl> <dbl>
1 1 1 10 30
2 1 2 11 35
3 1 3 12 38
4 2 1 15 50
5 2 2 16 55
6 2 3 17 56
7 3 1 20 60
8 3 2 21 65
9 3 3 22 67
Note:
The names_pattern
is a regular expression, looking for a group of words, then a digit. The digit goes into the time
column, whereas the word tells us the column the value should go into, either the age
or the weight
column.