Search code examples
rmergesummarize

R: Combine rows with same ID


Edit: I changed Var4 to a string value as my question was not precise enough about my data and therefore answers were failing because of invalid types. Sorry for that

this is my first question here and I hope someone can help me.

I have the following data set:

ID Date N_Date Var1 Var2 Var3 Var4 type
1 4.7.22 50000 12 NA NA NA normal
1 4.7.22 50000 NA 23 NA NA normal
1 4.7.22 50000 NA NA 5 NA normal
1 4.7.22 50000 NA NA NA asd normal
2 4.7.22 50000 NA 2 NA NA normal
3 5.7.22 20000 7 NA NA NA normal

My goal is to have just one row for each ID. So what I want R to do, is to shift the Var column values for each ID up or somehow combine them. As you can see, at the moment, there is never more than one value in a Var column for each row. So it should be easy to rewrite the NAs with the corresponding "real value". I also found similiar questions but the answer did not help in my case:

How to combine rows with the same identifier R?

I think the problem in my case is, that I have columns like "date", "N_date" (which is the number of observations on that date) and "type". In these cases my code should see, that it is exactly the same value for the corresponding ID, and just take the first value for example.

So that in the end I just have 3 rows with same number of columns, containing all information.

Thank you very much for anyone who has an idea how to solve this.


Solution

  • Something like this: Here we first group for all except the Var variables, then we use summarise(across... as suggested by @Limey in the comments section. Main feature is to use na.rm=TRUE:

    library(dplyr)
    
    df %>% 
      group_by(ID, Date, N_Date, type) %>% 
      summarise(across(starts_with("Var"), ~sum(., na.rm = TRUE)))
    
         ID Date   N_Date type    Var1  Var2  Var3  Var4
      <int> <chr>   <int> <chr>  <int> <int> <int> <int>
    1     1 4.7.22  50000 normal    12    23     5    54
    2     2 4.7.22   4000 normal     0     2     0     0
    3     3 5.7.22  20000 normal     7     0     0     0