I am trying to fill the NA values in this data frame with the most recent non-NA value in the cost column. I want to group by city - so all NAs for Omaha should be 44.50, and the NAs for Lincoln should be 62.50. Here is the code I have been using - it replaces the first NA (April) for each group with the correct value, but does not fill past that.
df <- df %>%
group_by(city) %>%
mutate(cost = ifelse(is.na(cost), lag(cost, na.rm=TRUE), cost))
Data before running code:
year month city cost
2021 January Omaha 45.50
2021 February Omaha 46.75
2021 March Omaha 44.50
2021 April Omaha NA
2021 May Omaha NA
2021 June Omaha NA
2021 January Lincoln 55.25
2021 February Lincoln 53.80
2021 March Lincoln 62.50
2021 April Lincoln NA
2021 May Lincoln NA
2021 June Lincoln NA
Use:
library(tidyverse)
df %>%
group_by(city) %>%
fill(cost)
# A tibble: 12 x 4
# Groups: city [2]
year month city cost
<int> <chr> <chr> <dbl>
1 2021 January Omaha 45.5
2 2021 February Omaha 46.8
3 2021 March Omaha 44.5
4 2021 April Omaha 44.5
5 2021 May Omaha 44.5
6 2021 June Omaha 44.5
7 2021 January Lincoln 55.2
8 2021 February Lincoln 53.8
9 2021 March Lincoln 62.5
10 2021 April Lincoln 62.5
11 2021 May Lincoln 62.5
12 2021 June Lincoln 62.5