My question has been answered before using zoo::
and data.table::
; I'm curious as to what the best solution with tidyverse/dplyr would be.
Previous answers (non-tidyverse): Forward and backward fill data frame in R Replacing NAs with latest non-NA value
My data looks like this, where the earliest two years (2015, 2016) in each country (usa, aus) have missing data (code for data input at the bottom):
#> country year value
#> 1 usa 2015 NA
#> 2 usa 2016 NA
#> 3 usa 2017 100
#> 4 usa 2018 NA
#> 5 aus 2015 NA
#> 6 aus 2016 NA
#> 7 aus 2017 50
#> 8 aus 2018 60
I would like to fill the missing values, within each country, with the value available in 2017.
I would like that fill to only be for the years prior to 2017--so an NA in 2018 should not be filled in by anything. It should remain NA.
So my desired output is:
#> country year value
#> 1 usa 2015 100
#> 2 usa 2016 100
#> 3 usa 2017 100
#> 4 usa 2018 NA
#> 5 aus 2015 50
#> 6 aus 2016 50
#> 7 aus 2017 50
#> 8 aus 2018 60
I tried group_by(country)
and then I suspect I'm meant to use coalesce()
, but I normally use coalesce
across vectors, not along them.
library(tidyverse)
df %>% group_by(country) %>%
What's the easiest way to do this using tidyverse tools?
#install.packages("datapasta")
df <- data.frame(
stringsAsFactors = FALSE,
country = c("usa", "usa", "usa", "usa", "aus", "aus", "aus", "aus"),
year = c(2015L, 2016L, 2017L, 2018L, 2015L, 2016L, 2017L, 2018L),
value = c(NA, NA, 100L, NA, NA, NA, 50L, 60L)
)
df
We can replace
the NA
s before 2017 with value available in 2017 year for each country
.
library(dplyr)
df %>%
group_by(country) %>%
mutate(value = replace(value, is.na(value) & year < 2017, value[year == 2017]))
#Similarly with ifelse
#mutate(value = ifelse(is.na(value) & year < 2017, value[year == 2017], value))
# country year value
# <chr> <int> <int>
#1 usa 2015 100
#2 usa 2016 100
#3 usa 2017 100
#4 usa 2018 NA
#5 aus 2015 50
#6 aus 2016 50
#7 aus 2017 50
#8 aus 2018 60