I am trying to turn it into the following panel data:
As you can see in the last image, I would like to repeat the values between years for the same country, and repeat the last value for the subsequent years until the year 2020.
You can use grid.expand
to get the country / year combinations you want, then left_join
the main data frame to this, and finally fill
the missing data, ensuring you filter out any remaining NAs
.
library(dplyr)
library(tidyr)
panel <- expand.grid(year = min(df$year):2020,
country = unique(df$country),
stringsAsFactors = FALSE) %>%
left_join(df) %>%
group_by(country) %>%
fill(c("id", "regioncode", "prespowl")) %>%
filter(!is.na(id)) %>%
as.data.frame()
Which gives the following result:
panel
#> year country id regioncode prespowl
#> 1 2011 Albania 1 Europe 0.1817557
#> 2 2012 Albania 1 Europe 0.1817557
#> 3 2013 Albania 1 Europe 0.1817557
#> 4 2014 Albania 1 Europe 0.1817557
#> 5 2015 Albania 1 Europe 0.1817557
#> 6 2016 Albania 1 Europe 0.1817557
#> 7 2017 Albania 1 Europe 0.1817557
#> 8 2018 Albania 1 Europe 0.1411482
#> 9 2019 Albania 1 Europe 0.1411482
#> 10 2020 Albania 1 Europe 0.1411482
#> 11 2016 Algeria 2 Africa 0.3837466
#> 12 2017 Algeria 2 Africa 0.3837466
#> 13 2018 Algeria 2 Africa 0.4837466
#> 14 2019 Algeria 2 Africa 0.4837466
#> 15 2020 Algeria 2 Africa 0.4837466
#> 16 1999 Argentina 3 Americas 0.2887138
#> 17 2000 Argentina 3 Americas 0.2887138
#> 18 2001 Argentina 3 Americas 0.2887138
#> 19 2002 Argentina 3 Americas 0.2887138
#> 20 2003 Argentina 3 Americas 0.2887138
#> 21 2004 Argentina 3 Americas 0.2887138
#> 22 2005 Argentina 3 Americas 0.2887138
#> 23 2006 Argentina 3 Americas 0.4322523
#> 24 2007 Argentina 3 Americas 0.4322523
#> 25 2008 Argentina 3 Americas 0.4322523
#> 26 2009 Argentina 3 Americas 0.4322523
#> 27 2010 Argentina 3 Americas 0.4322523
#> 28 2011 Argentina 3 Americas 0.4322523
#> 29 2012 Argentina 3 Americas 0.4322523
#> 30 2013 Argentina 3 Americas 0.5453171
#> 31 2014 Argentina 3 Americas 0.5453171
#> 32 2015 Argentina 3 Americas 0.5453171
#> 33 2016 Argentina 3 Americas 0.5453171
#> 34 2017 Argentina 3 Americas 0.5453171
#> 35 2018 Argentina 3 Americas 0.5453171
#> 36 2019 Argentina 3 Americas 0.5453171
#> 37 2020 Argentina 3 Americas 0.5453171
Data used:
df <- read.table(text= 'country year id regioncode prespowl
Albania 2011 1 Europe 0.1817557
Albania 2018 1 Europe 0.1411482
Algeria 2016 2 Africa 0.3837466
Algeria 2018 2 Africa 0.4837466
Argentina 1999 3 Americas 0.2887138
Argentina 2006 3 Americas 0.4322523
Argentina 2013 3 Americas 0.5453171
', header = TRUE, stringsAsFactors = FALSE)