Search code examples
rlagtidyr

How to tidy huge blank spaces in data_frame?


I've got some data pulled from the web that has about 15,000 rows. The date column has a date at the beginning of each 'block section' of data. About 25% of which is blank. So it's a column that will have one cell of 2010, then NA's for a while, then 2010 again. Basically, the date is the same . . . until it changes. I've pulled a date variable out and it looks like the Date_from source column in the following code:

data_frame(
  Date_from_source = 
    c(2010,rep(NA,2),2010,rep(NA,2),2011,rep(NA,3),2011,rep(NA,2),2012), 
  Date_tidy = c(rep(2010,6),rep(2011,7),2012))

I'm stuck on how to code the Data_from_source column into a Date_tidy column. I've tried using a group_by and lag approach, but I'm spinning my wheels. Thanks.


Solution

  • You're looking for tidyr::fill

    library(tidyr)
    library(dplyr)
    df1 <- data_frame(
      Gender = sample(c('male','female'),14,T),
      Date_from_source = 
        c(2010,rep(NA,2),2010,rep(NA,2),2011,rep(NA,3),2011,rep(NA,2),2012), 
      Date_tidy = c(rep(2010,6),rep(2011,7),2012))
    
    df1 %>% group_by(Gender) %>% fill(Date_from_source)
    
        # # A tibble: 14 x 3
        # # Groups:   Gender [2]
        # Gender Date_from_source Date_tidy
        # <chr>            <dbl>     <dbl>
        #  1 female               NA      2010
        #  2 female             2011      2011
        #  3 female             2011      2011
        #  4 female             2011      2011
        #  5 female             2011      2011
        #  6   male             2010      2010
        #  7   male             2010      2010
        #  8   male             2010      2010
        #  9   male             2010      2010
        # 10   male             2010      2010
        # 11   male             2010      2011
        # 12   male             2010      2011
        # 13   male             2010      2011
        # 14   male             2012      2012