I have a tidy dataset with an slightly unusual structure in some columns. The data structure is outlined in an example below - there is a header row containing all the key information, with records for each bank (rightmost column) in the rows below but with NA instead of carrying forward the key information. I would like to use tidy::fill to carry forward the last non-NA observation, however for one column there is usually another entry below the header row (see example). Is it possible to use fill to choose the last but one non-NA value, such that I can fill it in with the correct information? Thanks for any help!
library(tidyverse)
df <- tibble(amount = c(2000, NA, NA, NA, 2500, NA, NA, NA),
amount_thismkt = c(1000, NA, NA, NA, 2500, NA, NA, NA),
nation = c("United States", NA, NA, NA, "China", NA, NA, NA),
purpose = c("Refinancing", "GCP", NA, NA, "GCP", "Refinancing", NA, NA),
bank = c("A", "B", "C", "D", "A", "E", "F", "B"))
df
# # A tibble: 8 × 5
# amount amount_thismkt nation purpose bank
# <dbl> <dbl> <chr> <chr> <chr>
# 1 2000 1000 United States Refinancing A
# 2 NA NA NA GCP B
# 3 NA NA NA NA C
# 4 NA NA NA NA D
# 5 2500 2500 China GCP A
# 6 NA NA NA Refinancing E
# 7 NA NA NA NA F
# 8 NA NA NA NA B
df %>% fill(amount, amount_thismkt, nation)
# # A tibble: 8 × 5
# amount amount_thismkt nation purpose bank
# <dbl> <dbl> <chr> <chr> <chr>
# 1 2000 1000 United States Refinancing A
# 2 2000 1000 United States GCP B
# 3 2000 1000 United States NA C
# 4 2000 1000 United States NA D
# 5 2500 2500 China GCP A
# 6 2500 2500 China Refinancing E
# 7 2500 2500 China NA F
# 8 2500 2500 China NA B
Unsure if you want to remove the last value, and just fill with the first value, or just leave the last value in place. To do the first, you can do:
df |>
mutate(purpose = ifelse(!is.na(lag(purpose)), NA, purpose)) |>
fill(amount, amount_thismkt, nation, purpose)
Output:
# A tibble: 8 × 5
amount amount_thismkt nation purpose bank
<dbl> <dbl> <chr> <chr> <chr>
1 2000 1000 United States Refinancing A
2 2000 1000 United States Refinancing B
3 2000 1000 United States Refinancing C
4 2000 1000 United States Refinancing D
5 2500 2500 China GCP A
6 2500 2500 China GCP E
7 2500 2500 China GCP F
8 2500 2500 China GCP B
To do the second, you can do:
df |>
fill(amount, amount_thismkt, nation) |>
mutate(purpose = ifelse(is.na(purpose), first(purpose), purpose), .by = nation)
Output:
# A tibble: 8 × 5
amount amount_thismkt nation purpose bank
<dbl> <dbl> <chr> <chr> <chr>
1 2000 1000 United States Refinancing A
2 2000 1000 United States GCP B
3 2000 1000 United States Refinancing C
4 2000 1000 United States Refinancing D
5 2500 2500 China GCP A
6 2500 2500 China Refinancing E
7 2500 2500 China GCP F
8 2500 2500 China GCP B