My dataset is like this
df
ID Year County APV sample
1 2014 A 1 1
1 2015 A 1 1
1 2016 A 0 0
1 2017 A NA 0
1 2018 A NA 0
1 2019 A NA 0
2 2014 B 1 1
2 2015 B 1 1
2 2016 B 1 1
2 2017 B 1 1
2 2018 B 0 0
2 2019 B NA 0
3 2014 A 1 1
3 2015 A 1 1
3 2016 A 0 0
3 2017 A NA 0
3 2018 A NA 0
3 2019 A NA 0
And so on
So I want to tabulate this data.
If I only want to tabulate by year
datos<-as.data.frame(table(df$APV==0 & df$sample==0, by=df$Year))
the data set that I obtain looks like this:
df1
Var1 by Freq
FALSE 2014 3
TRUE 2014 0
FALSE 2015 3
TRUE 2015 0
FALSE 2016 1
TRUE 2016 2
. . .
. . .
. . .
So false means the still open firms.
How can I tabulate by year and County?
APV tells me the first closure of the enterprise, (the 0) so I want to know how many enterprises closed by year and county
There are two approaches.
I added !is.na(APV)
for two reasons: (1) it wasn't clear to me what you expected to happen there; and (2) table
was actually more robust to NA
than xtabs
, so I wanted the two results to be the same. The premise of the two approaches are the same, but they do appear to handle NA
s differently.
You might just need to know that table
takes an arbitrary number of arguments, so
head(as.data.frame(table(df$Var1, df$Year, df$County)))
# Var1 Var2 Var3 Freq
# 1 FALSE 2014 A 2
# 2 TRUE 2014 A 0
# 3 FALSE 2015 A 2
# 4 TRUE 2015 A 0
# 5 FALSE 2016 A 0
# 6 TRUE 2016 A 2
While the names are lost, it still works.
out <- as.data.frame(
xtabs(~ Var1 + Year + County,
data = transform(df, Var1 = (!is.na(APV) & APV == 0 & sample == 0)))
)
head(out)
# Var1 Year County Freq
# 1 FALSE 2014 A 2
# 2 TRUE 2014 A 0
# 3 FALSE 2015 A 2
# 4 TRUE 2015 A 0
# 5 FALSE 2016 A 0
# 6 TRUE 2016 A 2
(I used transform
for my simplicity.)
out2 <- as.data.frame(
do.call(table, subset(transform(df, Var1 = (!is.na(APV) & APV == 0 & sample == 0)),
select = c(Var1, Year, County)))
)
(same results)