I have a dataset like below, with unique firm-year observations. But the variable IndustryCode has some NA due to the merger earlier.
stkcd date industrycode
10 2002 .
10 2003 .
10 2004 E22
10 2005 E22
10 2006 E22
10 2007 E22
10 2008 G45
10 2009 G45
10 2010 .
10 2011 .
11 2001 .
11 2002 .
11 2003 D23
11 2004 D23
....
I want to impute NA with the value from the closest year of the same firm. For example for firm 10(stkcd=10), IndustryCode in the years before 2004 is changed to E22, the value from 2004, and NAs in the years after 2009 is replaced by G45, the value from 2009.
How can I achieve this in R?
How about apply two times the na.locf
function from zoo
?
Basically it substitutes NA
with the last value, you'll need to go backwards too.
Some data:
dat <- data.frame(
stkcd = rep(10, 10),
year = 2002:2011,
type = c(NA,NA, "E22", "E22","E22", "E22", "G45", "G45", NA, NA)
)
library(zoo)
dat$type <- na.locf(dat$type, na.rm = F) # computes NA with the last value found
dat$type <- na.locf(dat$type, na.rm = F, fromLast = T) # this because you start with NAs, so you need to go backwards too
# output:
# stkcd year type
# 1 10 2002 E22
# 2 10 2003 E22
# 3 10 2004 E22
# 4 10 2005 E22
# 5 10 2006 E22
# 6 10 2007 E22
# 7 10 2008 G45
# 8 10 2009 G45
# 9 10 2010 G45
# 10 10 2011 G45
If you have multiple company, you need to group_by
first, from dplyr
:
library(dplyr)
library(zoo)
dat %>%
group_by(stkcd) %>% # the variable used for the company name
mutate(type = na.locf(type, na.rm = F),
type = na.locf(type, na.rm = F, fromLast = T))
For example with 2 companies:
dat <- data.frame(
stkcd = c(rep(10, 10), rep(20,10)),
year = rep(2002:2011, 2),
type = c(NA,NA, "E22", "E22","E22", "E22", "G45", "G45", NA, NA,
NA,NA, "E22", "E22","E22", "E22", "G45", "G45", NA, NA)
)
dat %>%
group_by(stkcd) %>% # the variable used for the company name
mutate(type = na.locf(type, na.rm = F),
type = na.locf(type, na.rm = F, fromLast = T))
# A tibble: 20 x 3
# Groups: stkcd [2]
# stkcd year type
# <dbl> <int> <fct>
# 1 10 2002 E22
# 2 10 2003 E22
# 3 10 2004 E22
# 4 10 2005 E22
# 5 10 2006 E22
# 6 10 2007 E22
# 7 10 2008 G45
# 8 10 2009 G45
# 9 10 2010 G45
# 10 10 2011 G45
# 11 20 2002 E22
# 12 20 2003 E22
# 13 20 2004 E22
# 14 20 2005 E22
# 15 20 2006 E22
# 16 20 2007 E22
# 17 20 2008 G45
# 18 20 2009 G45
# 19 20 2010 G45
# 20 20 2011 G45