I've got some data pulled from the web that has about 15,000 rows. The date column has a date at the beginning of each 'block section' of data. About 25% of which is blank. So it's a column that will have one cell of 2010, then NA's for a while, then 2010 again. Basically, the date is the same . . . until it changes. I've pulled a date variable out and it looks like the Date_from source column in the following code:
data_frame(
Date_from_source =
c(2010,rep(NA,2),2010,rep(NA,2),2011,rep(NA,3),2011,rep(NA,2),2012),
Date_tidy = c(rep(2010,6),rep(2011,7),2012))
I'm stuck on how to code the Data_from_source column into a Date_tidy column. I've tried using a group_by and lag approach, but I'm spinning my wheels. Thanks.
You're looking for tidyr::fill
library(tidyr)
library(dplyr)
df1 <- data_frame(
Gender = sample(c('male','female'),14,T),
Date_from_source =
c(2010,rep(NA,2),2010,rep(NA,2),2011,rep(NA,3),2011,rep(NA,2),2012),
Date_tidy = c(rep(2010,6),rep(2011,7),2012))
df1 %>% group_by(Gender) %>% fill(Date_from_source)
# # A tibble: 14 x 3
# # Groups: Gender [2]
# Gender Date_from_source Date_tidy
# <chr> <dbl> <dbl>
# 1 female NA 2010
# 2 female 2011 2011
# 3 female 2011 2011
# 4 female 2011 2011
# 5 female 2011 2011
# 6 male 2010 2010
# 7 male 2010 2010
# 8 male 2010 2010
# 9 male 2010 2010
# 10 male 2010 2010
# 11 male 2010 2011
# 12 male 2010 2011
# 13 male 2010 2011
# 14 male 2012 2012