Search code examples
rdataframeaggregate

aggregate() in R not reproducible from 2011 example


Replicating a 2011 example script, the aggregate() function of base R produces NANs. I was wondering if I need to use a more recent version of aggregate or a similar function? Please advise.

Example s1s2.df can be found here: https://www.dropbox.com/s/dsqina3vuy0774u/df.csv?dl=0

Code that produces NAN instead of summarised values:

s1.no.present <- aggregate(s1s2.df$no.present[s1s2.df$sabap==-1], by=list(s1s2.df$month.n[s1s2.df$sabap==-1]),sum)[,2]
s1.no.cards <- aggregate(s1s2.df$no.cards[s1s2.df$sabap==-1], by=list(s1s2.df$month.n[s1s2.df$sabap==-1]),sum)[,2]
s2.no.present <- aggregate(s1s2.df$no.present[s1s2.df$sabap==1], by=list(s1s2.df$month.n[s1s2.df$sabap==1]),sum)[,2]
s2.no.cards <- aggregate(s1s2.df$no.cards[s1s2.df$sabap==1], by=list(s1s2.df$month.n[s1s2.df$sabap==1]),sum)[,2]

Incorrect output:

> tibble(s1.no.present)
# A tibble: 12 × 1
   s1.no.present
           <int>
 1            NA
 2            NA
 3            NA
 4            NA
 5            NA
 6            NA
 7            NA
 8            NA
 9            NA
10            NA
11            NA
12            NA

Solution

  • Use a custom sum function to remove NAs:

    #data
    s1s2.df <- read.csv("tmp.csv")
    
    mySum <- function(x){ sum(x, na.rm = TRUE) }
    aggregate(s1s2.df$no.present[s1s2.df$sabap == -1 ],
              by = list(s1s2.df$month.n[s1s2.df$sabap == -1 ]),
              mySum)
    #    Group.1    x
    # 1        1  218
    # 2        2  369
    # 3        3  590
    # 4        4 1471
    # 5        5 1880
    # 6        6 2241
    # 7        7 2306
    # 8        8 1827
    # 9        9 1377
    # 10      10  774
    # 11      11  281
    # 12      12  280
    

    Or use formulas:

    aggregate(formula = no.present ~ month.n,
              data = s1s2.df[s1s2.df$sabap == -1, ],
              FUN = sum)
    #    month.n no.present
    # 1        1        218
    # 2        2        369
    # 3        3        590
    # 4        4       1471
    # 5        5       1880
    # 6        6       2241
    # 7        7       2306
    # 8        8       1827
    # 9        9       1377
    # 10      10        774
    # 11      11        281
    # 12      12        280