I have a dataset like this:
id date value
1 8/06/12 1
1 8/08/12 1
2 8/07/12 2
2 8/08/12 1
Every id should a have a value for every date. When an id is missing a particular date, that row needs to be added with a value of 0. E.g.,
id date value
1 8/06/12 1
1 8/07/12 0
1 8/08/12 1
2 8/06/12 0
2 8/07/12 2
2 8/08/12 1
I'm trying to figure out how to add the rows with 0s. There's a good solution here: R - Fill missing dates by group. However, I can't use the tidyr::complete
function because I'm using sparklyr
and, as far as I know, need to stay within dplyr
functions.
In sparklyr
, you must use Spark functions. This is a job for coalesce
. First you have to fill out all the pairs of ids and dates you expect to see, so maybe something like:
(edit)
all_id <- old_data %>% distinct(id) %>% mutate(common=0)
all_date <- old_data %>% distinct(date) %>% mutate(common=0)
all_both <- all_id %>% full_join(all_date,by='common')
data <- old_data %>%
right_join(all_both %>% select(-common),by=c('id','date')) %>%
mutate(value=`coalesce(value,0)`)
I have assumed you have all the dates and ids you care about in your old data, though that might not be the case.