I have a problem that I can not fix myself, having tried multiple ways myself.
Goal: I want to winsorize individual variables for different event years.
That is, I want to winsorize all observations for variable var 1 (and var 2, var 3 etc.) for each event year (-5,-4,-3...+4,+5)
Imagine the following data structure (pdata.frame, object of the plm package, behaves similarly to a normal data frame as far as I am aware):
#firm-year observations with different numeric variables, sequences only for illustrative purposes, true variables are empirical observations
firms <- c(rep("firm a",11), rep("firm b", 11))
years <- seq(-5,5,1)
var1 <- seq(0,2.1,0.1)
var2 <- seq(0,21,1)
var3 <- seq(0,1,1/21)
#create panel data
df <- data.frame(cbind(firms,years,var1,var2,var3))
library(plm)
p.df <- pdata.frame(df, index=c("firms","years"))
# NAs in my variables exist
p.df[12,3] = NA
p.df[10,5] = NA
p.df
My data frame then looks like this:
firms years var1 var2 var3
firm a--1 firm a -1 0.4 4 0.19047619047619
firm a--2 firm a -2 0.3 3 0.142857142857143
firm a--3 firm a -3 0.2 2 0.0952380952380952
firm a--4 firm a -4 0.1 1 0.0476190476190476
firm a--5 firm a -5 0 0 0
firm a-0 firm a 0 0.5 5 0.238095238095238
firm a-1 firm a 1 0.6 6 0.285714285714286
firm a-2 firm a 2 0.7 7 0.333333333333333
firm a-3 firm a 3 0.8 8 0.380952380952381
firm a-4 firm a 4 0.9 9 <NA>
firm a-5 firm a 5 1 10 0.476190476190476
firm b--1 firm b -1 <NA> 15 0.714285714285714
firm b--2 firm b -2 1.4 14 0.666666666666667
firm b--3 firm b -3 1.3 13 0.619047619047619
firm b--4 firm b -4 1.2 12 0.571428571428571
firm b--5 firm b -5 1.1 11 0.523809523809524
firm b-0 firm b 0 1.6 16 0.761904761904762
firm b-1 firm b 1 1.7 17 0.80952380952381
firm b-2 firm b 2 1.8 18 0.857142857142857
firm b-3 firm b 3 1.9 19 0.904761904761905
firm b-4 firm b 4 2 20 0.952380952380952
firm b-5 firm b 5 2.1 21 1
My obviously has more observations and variables.
The closest solution I found was by Masato.
robustHD::winsorize
, which Masato used, is not compatible with data that has NAs though. DescTools:Winsorize
in turn does not accept data frames and will winsorize my data over all the variable columns which does not make sense.
I guess the solution lies somewhere in writing another for loop and using DescTools::Winsorize
on the individual columns after having extracted them for single event years (-5 to +5).
Another way would be to make the robustHD::winsorize
function somehow handle NAs and apply it to the whole frame of var1 to var3.
I highly appreciate your help on this and would love to here your suggestions.
Best regards, cork
Try this:
library(tidyverse)
p.df%>%
as.tibble()%>%
group_by(years)%>%
mutate(across(c(var1,var2,var3),as.double))%>%
mutate(across(c(var1,var2,var3),~DescTools::Winsorize(.x,na.rm=T)))
```