Search code examples
rdata.tableaggregategrouping

Is there a way to aggregate multiple subgroups within a larger group/column with data.table?


New R user here and I am trying to aggregate multiple groups of data within a larger group, for example Males/Females of Adults by Census Tracts. Currently I am writing it as:

DEmale <- DE_2016small[Gender_2016 == "M", .N, by = Residence_Addresses_CensusTract_2016] %>% rename(Males = N)

and a second function as:

DEfem <- DE_2016small[Gender_2016 == "F", .N, by = Residence_Addresses_CensusTract_2016] %>% rename(Females = N)

Is there any way to combine the code to find M and F as the same time, rather than as two separate commands? The tabular data is huge, and I will need to create multiple groups more efficiently than a single command at a time.


Solution

  • Another way based on data.table:

    dcast(data=DE_2016small, 
          formula=Residence_Addresses_CensusTract_2016 ~ factor(Gender_2016, c("M", "F"), c("Male", "Female")), 
          fun=length)
    
    #     Residence_Addresses_CensusTract_2016  Male Female
    #  1:                                    A    19     19
    #  2:                                    B    16     15
    #  3:                                    C    18     25
    #  4:                                    D    15     13
    #  5:                                    E    22     24
    #  6:                                    F    14     25
    #  7:                                    G    21     22
    #  8:                                    H    18     20
    #  9:                                    I    20     19
    # 10:                                    J    14     22
    # 11:                                    K    18     22
    # 12:                                    L    22     23
    # 13:                                    M    20     16
    # 14:                                    N    28     21
    # 15:                                    O    16     17
    # 16:                                    P    18     15
    # 17:                                    Q    26     22
    # 18:                                    R    20     22
    # 19:                                    S    23     26
    # 20:                                    T    18     12
    # 21:                                    U    20     11
    # 22:                                    V    18     18
    # 23:                                    W    20     22
    # 24:                                    X    21     14
    # 25:                                    Y    24     12
    # 26:                                    Z    17     17
    

    data

    set.seed(123)
    DE_2016small = data.table(Gender_2016=sample(c('M','F'),1000,replace=T),Residence_Addresses_CensusTract_2016=sample(LETTERS,1000,replace=T))