Search code examples
rgroup-bydplyrpanel-data

Homogenize grouped values in R in panel dataset


I have a household panel survey dataset. It has two surveys in it: one for individuals and one for households. One person in each household answers both while all other people in the household answer the individual survey only. The municipal location is only in the person who filled the houehold. The dataset is a panel so each observation is there multiple times for different waves of the study. So basically I have something like this:

  df <- data.frame(id = c (11,11, 12,12,13, 13,14, 14,21, 21,22, 22,31, 31,32, 32,33, 33,34, 34,41, 41,42, 42,43, 43,44, 44,51, 51,61, 61,62, 62))
df$idhousehold <- c(1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4, 5,5, 6, 6, 6,6)
df$municipality <- c(NA, NA, NA,NA, NA, NA,"A","A",NA, NA, "A", "A",NA, NA,NA, NA, "B", "B", NA,NA, "A", "A",NA,NA,NA,NA,NA,NA, "C", "C","B","B",NA, NA)
df$year <- c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)
df

What I basically want to do is to assign the same municipality value for everyone within the same household. In the example above this means everyone in households 1, 2 and 4 has value "A" for municipality, for households 3 and 6 everyone has B and C in 5. I cannot manually do that because the dataset has around 130k observations.

I tried multiple ways of goruping the data and mutating a new variable but I am not capable of getting what I want.

Thanks for the help!


Solution

  • An approach using the fill() function from tidyr:

    library(tidyverse)
    
    df <- data.frame(id = c (11,11, 12,12,13, 13,14, 14,21, 21,22, 22,31, 31,32, 32,33, 33,34, 34,41, 41,42, 42,43, 43,44, 44,51, 51,61, 61,62, 62))
    df$idhousehold <- c(1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4, 5,5, 6, 6, 6,6)
    df$municipality <- c(NA, NA, NA,NA, NA, NA,"A","A",NA, NA, "A", "A",NA, NA,NA, NA, "B", "B", NA,NA, "A", "A",NA,NA,NA,NA,NA,NA, "C", "C","B","B",NA, NA)
    df$year <- c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)
    
    df <- df %>% 
      group_by(idhousehold) %>% 
      fill(municipality, .direction = "updown") %>% 
      ungroup()
    
    df
    #> # A tibble: 34 x 4
    #>       id idhousehold municipality  year
    #>    <dbl>       <dbl> <chr>        <dbl>
    #>  1    11           1 A                1
    #>  2    11           1 A                2
    #>  3    12           1 A                1
    #>  4    12           1 A                2
    #>  5    13           1 A                1
    #>  6    13           1 A                2
    #>  7    14           1 A                1
    #>  8    14           1 A                2
    #>  9    21           2 A                1
    #> 10    21           2 A                2
    #> # … with 24 more rows
    

    Created on 2020-04-14 by the reprex package (v0.3.0)