Search code examples
rdata.tablefrequency-distribution

Frequency table including zeros for unused values, on a data.table


I have a data set that is as follows:

library(data.table)

test <- data.table(structure(list(Issue.Date = structure(c(16041, 16056, 16042,15990, 15996, 16001, 15995, 15981, 15986, 15996, 15996, 16002,16015, 16020, 16025, 16032, 16023, 16084, 16077, 16102, 16104,16107, 16112, 16113, 16115, 16121, 16125, 16128, 16104, 16132,16133, 16135, 16139, 16146, 16151), class = "Date"), 
    Complaint = structure(c(1L,4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,5L, 3L, 1L, 3L, 1L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 3L,3L, 3L), .Label = c("A", "B", "C", "D", "E"), class = "factor"),
    yr = c("2013", "2013", "2013", "2013", "2013", "2013", "2013","2013", "2013", "2013", "2013", "2013", "2013", "2013", "2013","2013", "2013", "2014", "2014", "2014", "2014", "2014", "2014","2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014","2014", "2014", "2014", "2014"), 
    Month = c("2013-12", "2013-12","2013-12", "2013-10", "2013-10", "2013-10", "2013-10", "2013-10","2013-10", "2013-10", "2013-10", "2013-10", "2013-11", "2013-11","2013-11", "2013-11", "2013-11", "2014-01", "2014-01", "2014-02","2014-02", "2014-02", "2014-02", "2014-02", "2014-02", "2014-02","2014-02", "2014-02", "2014-02", "2014-03", "2014-03", "2014-03","2014-03", "2014-03", "2014-03"), 
    da = c("02", "17", "03","12", "18", "23", "17", "03", "08", "18", "18", "24", "06","11", "16", "23", "14", "14", "07", "01", "03", "06", "11","12", "14", "20", "24", "27", "03", "03", "04", "06", "10","17", "22")), 
   .Names = c("Issue.Date", "Complaint", "yr","Month", "da"), class = c("data.table", "data.frame"), row.names = c(NA,-35L)))

Basically what I would like to do is use data.table to create a frequency table that has Complaint and Count by Month. The trick is that I need it to show a Count of zero if there are no Complaints of that type for that Month. I know how to do it without showing the zeros, but I want to know how to include them.

test[ , count := .N, by = "Month,Complaint"]

Solution

  • To directly get the counts for each group:

    setkey(test, Month, Complaint)
    
    # may need to also add allow.cartesian, depending on actual data
    test[CJ(Month, Complaint, unique = TRUE), .N, by = .EACHI]
    #      Month Complaint N
    # 1: 2013-10         A 0
    # 2: 2013-10         B 0
    # 3: 2013-10         C 5
    # 4: 2013-10         D 4
    # 5: 2013-10         E 0
    # 6: 2013-11         A 1
    # 7: 2013-11         B 0
    # 8: 2013-11         C 4
    # 9: 2013-11         D 0
    #10: 2013-11         E 0
    #11: 2013-12         A 1
    #12: 2013-12         B 0
    #13: 2013-12         C 0
    #14: 2013-12         D 2
    #15: 2013-12         E 0
    #16: 2014-01         A 0
    #17: 2014-01         B 0
    #18: 2014-01         C 1
    #19: 2014-01         D 0
    #20: 2014-01         E 1
    #21: 2014-02         A 2
    #22: 2014-02         B 0
    #23: 2014-02         C 6
    #24: 2014-02         D 2
    #25: 2014-02         E 0
    #26: 2014-03         A 1
    #27: 2014-03         B 2
    #28: 2014-03         C 3
    #29: 2014-03         D 0
    #30: 2014-03         E 0
    #      Month Complaint N
    

    See first revision of the answer if you want to have the counts in the full data.table instead of summarizing.