Search code examples
rbioinformatics

Tidying dataset to create graphs in R of dataset with multiple variables and multiple entries per variable


I have a dataset (made a fake one here), that I need to make a graph out of. The issue is that there are multiple IDs, and each ID has multiple x and y values at different time points. I want to be able to plot all these in the same graph, however, there are more than 400 IDs, each with two variables. I have made an example code to show a similar structure

ID <- c(1, 2, 3)
age1 <- c(10, 15, 20)
age2 <- c(11, 16, 21)
age3 <- c(12, 17, 22)
weight1 <- c(30, 50, 60)
weight2 <- c(35, 55, 65)
weight3 <- c(38, 56, 67)
df <- data.frame(ID, age1, age2, age3, weight1, weight2, weight3)

I want it to be like this

[![enter image description here](https://i.sstatic.net/08aIY.png)](https://i.sstatic.net/08aIY.png)

I cant simply delete duplicates as many numbers are duplicates

I have tried

df_new <- df %>%
                       gather(age,
                              age_value,
                              age1:age3
                             )%>%
                       gather(weight,
                              weight_value,
                              weight1:weight3
                             )

However this gives me multiple duplicates of Y.

I am currently at a loss on how to solve this issue.

this results in enter image description here


Solution

  • Seems like you want this:

    tidyr::pivot_longer(df, -ID, names_to = c(".value", "time"), names_pattern = "(\\w+)(\\d+)")
    

    Output:

    # A tibble: 9 × 4
         ID time    age weight
      <dbl> <chr> <dbl>  <dbl>
    1     1 1        10     30
    2     1 2        11     35
    3     1 3        12     38
    4     2 1        15     50
    5     2 2        16     55
    6     2 3        17     56
    7     3 1        20     60
    8     3 2        21     65
    9     3 3        22     67
    

    Note:

    The names_pattern is a regular expression, looking for a group of words, then a digit. The digit goes into the time column, whereas the word tells us the column the value should go into, either the age or the weight column.