Here's an example of some data that I have
dput(df)
structure(list(ID = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "b",
"c", "d", "e", "f", "j", "k", "n", "m", "q", "r"), Number = c(1,
2, 1, 3, 4, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,
1, 2, 1, 1, 1, 1, 1, 2, 2), Location = c(1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2), Surveyor = c("JKK", "JKK", "JKK", "JKK", "JKK", "JKK",
"JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK",
"JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK",
"JKK", "JKK", "JKK", "JKK", "JKK", "JKK", "JKK")), row.names = c(NA,
-31L), spec = structure(list(cols = list(ID = structure(list(), class = c("collector_character",
"collector")), Number = structure(list(), class = c("collector_double",
"collector")), Location = structure(list(), class = c("collector_double",
"collector")), Surveyor = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x00000253510611f0>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
In the above example I have the col ID
which contains the letters a-t (my real data have other letter codes), in Location
1 all of these codes are present. However in Location
2 some of them are missing.
I would like to add 0
into the Number
col where there are missing IDs.
My real data has 25 locations so ideally I'd like to be able to grab the full list of IDs and check that against each Location and add 0 where they are missing.
I've tried tidyverse::mutate
with case_when
but I'm not getting anywhere. Any help appreciated.
You can use tidyr::complete()
for this. The fill
parameter allows you to set the values in the columns you create.
df |>
tidyr::complete(
ID,
Location,
fill = list(Number = 0)
)
For the new values, the output will have 0
for Number
and NA
for any columns not explicitly set, i.e. Surveyor
.
I assigned the above to a variable called new_df
so you can easily compare with the original df
:
dplyr::anti_join(new_df, df, by = c("ID", "Location"))
# A tibble: 9 × 4
ID Location Number Surveyor
<chr> <dbl> <dbl> <chr>
1 a 2 0 NA
2 g 2 0 NA
3 h 2 0 NA
4 i 2 0 NA
5 l 2 0 NA
6 o 2 0 NA
7 p 2 0 NA
8 s 2 0 NA
9 t 2 0 NA