Search code examples
rdataframegroup-bygroupingdata-manipulation

Grouping into desired number of groups


I have a data frame like this: ID is the primary key and Apples is the number of apples that person has.

ID Apples
E1 10
E2 5
E3 NA
E4 5
E5 8
E6 12
E7 NA
E8 4
E9 NA
E10 8

I want to group NA and non-NA values into only 2 separate groups and get the count of each. I tried the normal group_by(), but it does not give me desired output.

Fruits %>% group_by(Apples) %>% summarize(n())

Apples    n()
<dbl>    <int>
 4         1            
 5         2            
 8         2            
 10        1            
 12        1            
 NA        3

My desired output:

Apples    n()
<dbl>    <int>
 non-NA    7                    
 NA        3

Solution

  • We can create a group for NA and non-NA using group_by, and we can also make it a factor so that we can change the labels in the same step. Then, get the number of observations for each group.

    library(dplyr)
    
    df %>% 
      group_by(grp = factor(is.na(Apples), labels=c("non-NA", "NA"))) %>% 
      summarise(`n()`= n())
    
    #  grp     `n()`
    #  <fct>  <int>
    #1 non-NA     7
    #2 NA         3
    

    Or in base R, we could use colSums:

    data.frame(Apples = c("non-NA", "NA"), n = c(colSums(!is.na(df))[2], colSums(is.na(df))[2]), row.names = NULL)
    

    Data

    df <- structure(list(ID = c("E1", "E2", "E3", "E4", "E5", "E6", "E7", 
    "E8", "E9", "E10"), Apples = c(10L, 5L, NA, 5L, 8L, 12L, NA, 
    4L, NA, 8L)), class = "data.frame", row.names = c(NA, -10L))