Search code examples
rdataframetabular

How to use table by with two arguments? R


My dataset is like this

df

ID Year  County  APV   sample 
1  2014   A       1      1
1  2015   A       1      1
1  2016   A       0      0
1  2017   A       NA     0
1  2018   A       NA     0
1  2019   A       NA     0

2  2014   B       1      1
2  2015   B       1      1
2  2016   B       1      1
2  2017   B       1      1
2  2018   B       0      0
2  2019   B       NA     0

3  2014   A       1      1
3  2015   A       1      1
3  2016   A       0      0
3  2017   A       NA     0
3  2018   A       NA     0
3  2019   A       NA     0

And so on

So I want to tabulate this data.

If I only want to tabulate by year

datos<-as.data.frame(table(df$APV==0 & df$sample==0, by=df$Year))

the data set that I obtain looks like this:

df1

Var1   by   Freq
FALSE  2014   3
TRUE   2014   0

FALSE  2015   3
TRUE   2015   0

FALSE  2016   1
TRUE   2016   2

.       .     .
.       .     .
.       .     .

So false means the still open firms.

How can I tabulate by year and County?

APV tells me the first closure of the enterprise, (the 0) so I want to know how many enterprises closed by year and county


Solution

  • There are two approaches.

    I added !is.na(APV) for two reasons: (1) it wasn't clear to me what you expected to happen there; and (2) table was actually more robust to NA than xtabs, so I wanted the two results to be the same. The premise of the two approaches are the same, but they do appear to handle NAs differently.

    table

    You might just need to know that table takes an arbitrary number of arguments, so

    head(as.data.frame(table(df$Var1, df$Year, df$County)))
    #    Var1 Var2 Var3 Freq
    # 1 FALSE 2014    A    2
    # 2  TRUE 2014    A    0
    # 3 FALSE 2015    A    2
    # 4  TRUE 2015    A    0
    # 5 FALSE 2016    A    0
    # 6  TRUE 2016    A    2
    

    While the names are lost, it still works.

    xtabs

    out <- as.data.frame(
      xtabs(~ Var1 + Year + County,
      data = transform(df, Var1 = (!is.na(APV) & APV == 0 & sample == 0)))
    )
    head(out)
    #    Var1 Year County Freq
    # 1 FALSE 2014      A    2
    # 2  TRUE 2014      A    0
    # 3 FALSE 2015      A    2
    # 4  TRUE 2015      A    0
    # 5 FALSE 2016      A    0
    # 6  TRUE 2016      A    2
    

    (I used transform for my simplicity.)

    do.call for dynamic columns

    out2 <- as.data.frame(
      do.call(table, subset(transform(df, Var1 = (!is.na(APV) & APV == 0 & sample == 0)), 
                            select = c(Var1, Year, County)))
    )
    

    (same results)