I have searched through the forums and have not found exactly the answer to my question. I have a data set from the World Bank
library(wbstats)
Gini <- wb(indicator = c("SI.POV.GINI"),
startdate = 2005, enddate = 2020)
Gini <- Gini[,c("iso3c", "date", "value")]
names(Gini)
names(Gini)<-c("iso3c", "date", "Gini")
#Change date to numeric
class(Gini$date)
Gini$date<-as.numeric(Gini$date)
#Tibble:
# A tibble: 1,012 x 3
iso3c date Gini
<chr> <dbl> <dbl>
1 ALB 2017 33.2
2 ALB 2016 33.7
3 ALB 2015 32.9
4 ALB 2014 34.6
5 ALB 2012 29
6 ALB 2008 30
7 ALB 2005 30.6
8 DZA 2011 27.6
9 AGO 2018 51.3
10 AGO 2008 42.7
# … with 1,002 more rows
Then I try to lag this estimate by one year
#Lag Gini
lg <- function(x)c(NA, x[1:(length(x)-1)])
Lagged.Gini<-ddply(Gini, ~ iso3c, transform, Gini.lag.1 = lg(Gini))
tibble(Lagged.Gini)
# A tibble: 1,032 x 4
iso3c date Gini Gini.lag.1
<chr> <dbl> <dbl> <dbl>
1 AGO 2018 51.3 NA
2 AGO 2008 42.7 51.3
3 ALB 2017 33.2 NA
4 ALB 2016 33.7 33.2
5 ALB 2015 32.9 33.7
6 ALB 2014 34.6 32.9
7 ALB 2012 29 34.6
8 ALB 2008 30 29
9 ALB 2005 30.6 30
10 ARE 2014 32.5 NA
Unfortunately, my problem is that when years are missing the lag does not recognize that year is missing and just puts the most recent year as the lag. Ex: country "ALB"'s Gini estimate is not lagged by one year in 2012 it lags to the next year which is 2008.
I would want the final data to look the same but how I edited below -- and ideally to be able to lag for multiple years:
# A tibble: 1,032 x 4
iso3c date Gini Gini.lag.1
<chr> <dbl> <dbl> <dbl>
1 AGO 2018 51.3 NA
AGO 2017 NA 51.3
2 AGO 2008 42.7 NA
AGO 2007 NA 42.7
3 ALB 2017 33.2 NA
4 ALB 2016 33.7 33.2
5 ALB 2015 32.9 33.7
6 ALB 2014 34.6 32.9
ALB 2013 NA 29
7 ALB 2012 29 NA
8 ALB 2008 30 29
9 ALB 2005 30.6 30
10 ARE 2014 32.5 NA
pseudospin's answer is great for base R. Since you're using tibbles, here's a tidyverse version with the same effect:
Gini <- readr::read_table("
iso3c date Gini
ALB 2017 33.2
ALB 2016 33.7
ALB 2015 32.9
ALB 2014 34.6
ALB 2012 29
ALB 2008 30
ALB 2005 30.6
DZA 2011 27.6
AGO 2018 51.3
AGO 2008 42.7")
library(dplyr)
Gini %>%
transmute(iso3c, date = date - 1, Gini.lag.1 = Gini) %>%
full_join(Gini, ., by = c("iso3c", "date")) %>%
arrange(iso3c, desc(date))
# # A tibble: 17 x 4
# iso3c date Gini Gini.lag.1
# <chr> <dbl> <dbl> <dbl>
# 1 AGO 2018 51.3 NA
# 2 AGO 2017 NA 51.3
# 3 AGO 2008 42.7 NA
# 4 AGO 2007 NA 42.7
# 5 ALB 2017 33.2 NA
# 6 ALB 2016 33.7 33.2
# 7 ALB 2015 32.9 33.7
# 8 ALB 2014 34.6 32.9
# 9 ALB 2013 NA 34.6
# 10 ALB 2012 29 NA
# 11 ALB 2011 NA 29
# 12 ALB 2008 30 NA
# 13 ALB 2007 NA 30
# 14 ALB 2005 30.6 NA
# 15 ALB 2004 NA 30.6
# 16 DZA 2011 27.6 NA
# 17 DZA 2010 NA 27.6
If you need to do this n
times (one more lag each time), you can extend it programmatically this way:
Ginilags <- lapply(1:3, function(lg) {
z <- transmute(Gini, iso3c, date = date - lg, Gini)
names(z)[3] <- paste0("Gini.lag.", lg)
z
})
Reduce(function(a,b) full_join(a, b, by = c("iso3c", "date")),
c(list(Gini), Ginilags)) %>%
arrange(iso3c, desc(date))
# # A tibble: 28 x 6
# iso3c date Gini Gini.lag.1 Gini.lag.2 Gini.lag.3
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 AGO 2018 51.3 NA NA NA
# 2 AGO 2017 NA 51.3 NA NA
# 3 AGO 2016 NA NA 51.3 NA
# 4 AGO 2015 NA NA NA 51.3
# 5 AGO 2008 42.7 NA NA NA
# 6 AGO 2007 NA 42.7 NA NA
# 7 AGO 2006 NA NA 42.7 NA
# 8 AGO 2005 NA NA NA 42.7
# 9 ALB 2017 33.2 NA NA NA
# 10 ALB 2016 33.7 33.2 NA NA
# # ... with 18 more rows