Search code examples
rdplyrtidyverseplyr

summerize NA, count BY GROUP


I have this data frame for the test:

test_df <- structure(list(plant_sp = c("plant_1", "plant_1", "plant_2", "plant_2", "plant_3",
                                       "plant_3", "plant_3", "plant_3", "plant_3", "plant_4", 
                                       "plant_4", "plant_4", "plant_4", "plant_4", "plant_4",
                                       "plant_5", "plant_5", "plant_5", "plant_5", "plant_5"), 
                          sp_rich = c(1, 1, NA, 1, NA, 
                                      1, 0, 0, NA, 0,
                                      0, 1, 0, 0, 1, 
                                      0, NA, NA, 0,NA)), 
                     row.names = c(NA, -20L), class = "data.frame", 
                     .Names = c("plant_sp", "sp_rich"))

I want to create a new data frame that has a sammerize data out of this data:

the df I need

which indicates the count and NA in each group (for example in group plant_1 there are only 2 "1" in the group and 0 "NA"

can you help me? thanks Ido


Solution

  • This should work

    library(dplyr)
    
    test_df %>%
      group_by(plant_sp) %>%
      summarize(count = sum(sp_rich > 0 & !is.na(sp_rich)),
                miss = sum(is.na(sp_rich)))
    
    # A tibble: 5 x 3
      plant_sp count  miss
      <chr>    <int> <int>
    1 plant_1      2     0
    2 plant_2      1     1
    3 plant_3      1     2
    4 plant_4      2     0
    5 plant_5      0     3