Search code examples
rdplyrplyrsummarize

How to summarise and subset multi-level grouped dataframe in dplyr and R


I have the following data in long format:

testdf <- tibble(
          name = c(rep("john", 4), rep("joe", 2)), 
          rep = c(1, 1, 2, 2, 1, 1), 
          field = rep(c("pet", "age"), 3), 
          value = c("dog", "young", "cat", "old", "fish", "young")
)

For each named person (John and Joe), I want to summarise EACH of their pets:
For some reason I can't seem to deal with the repeating events/pets in "John" data.
If I filter just for Joe (only has one pet), the code works.

Any help much appreciated...

testdf %>%
          group_by(name, rep) %>%
        #  filter(name == "joe") %>%  # when I filter only for Joe, the code works
          summarise(
                    about = paste0(
                              "The pet is a: ", .[field == "pet", "value"], " and it is ", .[field == "age", "value"]
                    )
          )

Solution

  • testdf %>%
      pivot_wider(id_cols = name:rep,names_from = field) %>% 
      mutate(about = paste0("The pet is a: ", pet, " and it is ", age))
    
      name    rep pet   age   about                             
      <chr> <dbl> <chr> <chr> <chr>                             
    1 john      1 dog   young The pet is a: dog and it is young 
    2 john      2 cat   old   The pet is a: cat and it is old   
    3 joe       1 fish  young The pet is a: fish and it is young
    

    This can also be done with data.table, as follows:

    library(data.table)
    
    setDT(testdf)[
      ,j = .(about = paste0("The pet is a ", .SD[field=="pet",value], " and it is ", .SD[field=="age",value])),
      by = .(name,rep)
    ]
    
       name rep                             about
    1: john   1  The pet is a dog and it is young
    2: john   2    The pet is a cat and it is old
    3:  joe   1 The pet is a fish and it is young