I have a unbalanced panel data. the simplified example as follows:
Firm Year V1 V2 V3
AAA 2001 0 8 2
AAA 2002 10 4 2
BBB 1999 0 8 8
BBB 2000 7 0 6
BBB 2001 10 100 1
BBB 2002 1 5 8
CCC 2001 2 8 0
CCC 2002 -98 3 10
What I'd like to do is to trim outliers for each year across all frims for each Variable. The way of trim is to assign NA to those values > 99% and < 1% of the data. For example, for year 2000, for Var_1, assign NA to all the values that > 99% or < 1%. Idealy, after trimming the data will look like this:
Firm Year V1 V2 V3
AAA 2001 0 8 2
AAA 2002 10 4 2
BBB 1999 0 8 8
BBB 2000 7 0 6
BBB 2001 10 NA 1
BBB 2002 1 5 8
CCC 2001 2 8 0
CCC 2002 NA 3 10
The real data is massive with lot of unbalanced data structure.
You can try the following to mark the outliers for each variable for each year at once.
data.frame(lapply(df[3:5], # variable indices 3:5
function(y) ave(y, df$Year, FUN=function(x)
ifelse(x < quantile(x, .01) | x > quantile(x, .99), NA, x))))
# V1 V2 V3
# 1 NA 8 NA
# 2 NA 4 NA
# 3 0 8 8
# 4 7 0 6
# 5 NA NA 1
# 6 1 NA 8
# 7 2 8 NA
# 8 NA NA NA