Search code examples
rdatabasedataframedata-wrangling

Fill in missing rows


I have a data frame of county executives and the year they were inaugurated.

I am runnig a panel study with county-year as the unit of analyis. The date range is 2000 to 2004.

I will like to expand the df such that it lists who was the county executive during each year between the years 2000 and 2004.

For instance, I would like this df

df <- data.frame(year= c(2000, 2001, 2003, 2000, 2002, 2004),
                  executive.name= c("Johnson", "Smith", "Alleghany", "Roberts", "Clarke", "Tollson"),
                 party= c("PartyRed", "PartyYellow", "PartyGreen", "PartyYellow", "PartyOrange", "PartyRed"),
                  district= rep(c(1001, 1002), each=3))

to look like this

df.neat <- data.frame(year= c(2000, 2001, 2002, 2003, 2004, 2000, 2001, 2002, 2003, 2004),
                  executive.name= c("Johnson", "Smith", "Smith", "Alleghany", "Alleghany", "Roberts", "Roberts", "Clarke", "Clarke", "Tollson"),
                  party= c("PartyRed", "PartyYellow", "PartyYellow", "PartyGreen", "PartyGreen", "PartyYellow", "PartyYellow", "PartyOrange", "PartyOrange", "PartyRed"),
                  district= rep(c(1001, 1002), each=5))

Solution

  • df |>
      tidyr::complete(district, year) |>
      dplyr::group_by(district) |>
      tidyr::fill(executive.name, party) |>
      dplyr::ungroup()
    

    Result

    # A tibble: 10 × 4
       district  year executive.name party      
          <dbl> <dbl> <chr>          <chr>      
     1     1001  2000 Johnson        PartyRed   
     2     1001  2001 Smith          PartyYellow
     3     1001  2002 Smith          PartyYellow
     4     1001  2003 Alleghany      PartyGreen 
     5     1001  2004 Alleghany      PartyGreen 
     6     1002  2000 Roberts        PartyYellow
     7     1002  2001 Roberts        PartyYellow
     8     1002  2002 Clarke         PartyOrange
     9     1002  2003 Clarke         PartyOrange
    10     1002  2004 Tollson        PartyRed