Search code examples
rdataframedata-manipulation

How do you combine columns based on shared values in another column?


I want combine the values of columns into a particular set of rows based on a shared row identifier. Here is a screenshot example of the data in Excel (left is current state and right is how I want it to turn out) screenshot of data with the same information below (top is current state and bottom is how I want it to turn out.

id code group minutes minutes 1 minutes 2 minutes 3
1e x222 1 4.5 4.5 NA NA
1e x124 1 4.5 4.5 NA NA
1e x143 2 6.7 NA 6.7 NA
1e x123 2 6.7 NA 6.7 NA
1e x123 2 6.7 NA 6.7 NA
1e x143 3 8.9 NA NA 8.9
1e x123 3 8.9 NA NA 8.9
id code group minutes 1 minutes 2 minutes 3
1e x222 1 4.5 6.7 8.9
1e x124 1 4.5 6.7 8.9

I want to find a way to fill the NAs (for minutes 2 and 3) in the group 1 rows with the values for minutes 2 and 3 when they share an id (and drop the rows for groups 2 and 3). There may not necessarily be a group 2 or 3 for an id, and the number of codes vary for group 1.

I have tried reshape(), pivot_wider() (on the minutes column), and a few other things, but it hasn't worked for me yet. I feel like the solution is simple but for some reason I am stuck. Any advice would be appreciated! I would prefer to do this in dplyr/the tidyverse if possible.


Solution

  • We could use fill from tidyr to replace the NAs with previous or next non-NAs after grouping

    library(dplyr)
    library(tidyr)
    df1  %>% 
      group_by(id) %>% 
      fill(starts_with("minutes"), .direction = "downup") %>%
      ungroup