Search code examples
rdplyrrows

DPLYR - merging rows together using a column value as a conditional


I have a series of rows in a single dataframe. I'm trying to aggregate the first two rows for each ID- i.e. - I want to combine events 1 and 2 for ID 1 into a single row, events 1 and 2 for ID 2 into a singlw row etc, but leave event 3 completely untouched.

id <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)

event <- c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)

score <- c(3,NA,1,3,NA,2,6,NA,1,8,NA,2,4,NA,1)

score2 <- c(NA,4,1,NA,5,2,NA,0,3,NA,5,6,NA,8,7)

df <- tibble(id, event, score, score2)
# A tibble: 15 x 4
      id event score score2
   <dbl> <dbl> <dbl>  <dbl>
 1     1     1     3     NA
 2     1     2    NA      4
 3     1     3     1      1
 4     2     1     3     NA
 5     2     2    NA      5
 6     2     3     2      2
 7     3     1     6     NA
 8     3     2    NA      0
 9     3     3     1      3
10     4     1     8     NA
11     4     2    NA      5
12     4     3     2      6
13     5     1     4     NA
14     5     2    NA      8
15     5     3     1      7

I've tried :

 df_merged<- df %>% group_by (id) %>% summarise_all(funs(min(as.character(.),na.rm=TRUE))),

which aggregates these nicely, but then I struggle to merge these back into the orignal dataframe/tibble (there are really about 300 different "score" columns in the full dataset, so a right_join is a headache with score.x, score.y, score2.x, score2.y all over the place...)

Ideally, the situation would need to be dplyr as the rest of my code runs on this!

EDIT:

Ideally, my expected output would be:

# A tibble: 10 x 4
      id event score score2
   <dbl> <dbl> <dbl>  <dbl>
 1     1     1     3      4
 3     1     3     1      1
 4     2     1     3      5
 6     2     3     2      2
 7     3     1     6      0
 9     3     3     1      3
10     4     1     8      5
12     4     3     2      6
13     5     1     4      8
15     5     3     1      7

Solution

  • We may change the order of NA elements with replace

    library(dplyr)
    df %>%
        group_by(id) %>%
         mutate(across(starts_with('score'),
             ~replace(., 1:2, .[1:2][order(is.na(.[1:2]))])))  %>% 
        ungroup  %>%   
        filter(if_all(starts_with('score'), Negate(is.na)))    
    

    -output

    # A tibble: 10 x 4
          id event score score2
       <dbl> <dbl> <dbl>  <dbl>
     1     1     1     3      4
     2     1     3     1      1
     3     2     1     3      5
     4     2     3     2      2
     5     3     1     6      0
     6     3     3     1      3
     7     4     1     8      5
     8     4     3     2      6
     9     5     1     4      8
    10     5     3     1      7