Search code examples
rtidyverseexpanddimensions

How to expand with multiple variables or dimensions


I want to expand across three dimensions in R. I want to merge county level information on a daily level from three years on a data frame which contains all counties for all years including all months with all days (e.g. 31). The problem is that not every county#day observation is available in the using data. This is because this events did not happen at particular dates in particular counties. Thus, these are zero observations for me.

In order to create my master file, I take a list of all counties. Then, I want to expand it such that I have an unique observation for each county#year#month#day combination.

I spare you the code. I have a data.frame including the counties. I would generate the years, months, and days. I used expand from tidyverse so far.

EDIT:

library(tidyverse)

# This is my list of all counties from an official source
counties <- data.frame("county" = c("A", "B" ,"c"))

# This is what I have, the data includes counties (not all),
# for year (not all),
# months (not all)
# and days (not all)

using <- data.frame("county"  = c("A", "A", "A", "B", "B", "B", "B"),
                    "year"  = c(2015,2016,2017,2015,2016,2017,2018),
                    "month" = c(1,2,7,2,3,2,4),
                    "day" = c(1,2,22,3,21,14,5))

# This is my attempt to get at least all county year combinations
county.month <- expand(counties, county, 1:12)

# But I wish I could get all county#year#month#dya combinations

Best

Daniel


Solution

  • This approach should do what you want: A tibble with all possible county/year/month/day combinations (Assuming each month has 31 days...;)) The key is to work with factors

    library(tidyverse)
    counties <- data.frame("county" = c("A", "B" ,"C"), stringsAsFactors = F)
    using <- tibble("county"  = c("A", "A", "A", "B", "B", "B", "B"),
                        "year"  = c(2015,2016,2017,2015,2016,2017,2018),
                        "month" = c(1,2,7,2,3,2,4),
                        "day" = c(1,2,22,3,21,14,5))
    
    using %>% 
      mutate_if(is.character, as_factor) %>%
      mutate_if(is.numeric, as.ordered) %>%
      mutate(county = fct_expand(county, counties$county),
             month = fct_expand(month, as.character(1:12)),
             day = fct_expand(day, as.character(1:31))) %>%
      expand(county, year, month, day) %>%
      arrange(year, month, day)
    
    # A tibble: 4,464 x 4
       county year  month day  
       <fct>  <ord> <ord> <ord>
     1 A      2015  1     1    
     2 B      2015  1     1    
     3 c      2015  1     1    
     4 A      2015  1     2    
     5 B      2015  1     2    
     6 c      2015  1     2    
     7 A      2015  1     3    
     8 B      2015  1     3    
     9 c      2015  1     3    
    10 A      2015  1     5    
    # … with 4,454 more rows