Search code examples
frequencyweighted

How to do a three-way weighted table in R - similar to wtd.table


I have found MANY questions similar to mine, but either they don't want weighted tables or only want two-ways tables. I am trying to do both.

Using wtd.table, I have the following line of code:

wtd.table(fulldata2$income, fulldata2$WIHH, fulldata2$hhsize, weights = fulldata2$WGTP)

This output only provides incomes and WIHH weighted. It does not also include hhsize.

Using regular table, I get the correct output in a three-way format, but not weighted.

tab <- table(fulldata2$income, fulldata2$WIHH, fulldata2$hhsize)
tab2 <- prop.table(tab) 

What function can do both three-way and weighted frequency tables? Ideally, also give it in a proportion like prop.table does.

Thanks!


Solution

  • First, here are some sample data (try to include these in your questions, even if it requires creating a sample data set like this). Note that I am using the tidyverse packages here:

    test <-
      tibble(
        var1 = "A"
        , var2 = "b"
        , var3 = "alpha") %>%
      complete(
        var1 = c("A", "B")
        , var2 = c("a", "b")
        , var3 = c("alpha", "beta")) %>%
      mutate(wt = 1:n())
    

    So, the data are:

    # A tibble: 8 x 4
      var1  var2  var3     wt
      <chr> <chr> <chr> <int>
    1 A     a     alpha     1
    2 A     a     beta      2
    3 A     b     alpha     3
    4 A     b     beta      4
    5 B     a     alpha     5
    6 B     a     beta      6
    7 B     b     alpha     7
    8 B     b     beta      8
    

    The function you are looking for then is xtabs:

    xtabs(wt ~ var1 + var2 + var3
          , data = test)
    

    gives:

     , , var3 = alpha
    
        var2
    var1 a b
       A 1 3
       B 5 7
    
    , , var3 = beta
    
        var2
    var1 a b
       A 2 4
       B 6 8
    

    If you don't need the result to have the table class, you can also do this by just using count from dplyr (part of the tidyverse):

    test %>%
      count(var1, var2, var3
            , wt = wt)
    

    gives a tibble (a modified data.frame) with your results:

    # A tibble: 8 x 4
      var1  var2  var3      n
      <chr> <chr> <chr> <int>
    1 A     a     alpha     1
    2 A     a     beta      2
    3 A     b     alpha     3
    4 A     b     beta      4
    5 B     a     alpha     5
    6 B     a     beta      6
    7 B     b     alpha     7
    8 B     b     beta      8
    

    And you can then perform whatever calculations you want on it, e.g. the percent within each var3:

    test %>%
      count(var1, var2, var3
            , wt = wt) %>%
      group_by(var3) %>%
      mutate(prop_in_var3 = n / sum(n))
    

    gives:

    # A tibble: 8 x 5
    # Groups:   var3 [2]
      var1  var2  var3      n prop_in_var3
      <chr> <chr> <chr> <int>        <dbl>
    1 A     a     alpha     1       0.0625
    2 A     a     beta      2       0.1   
    3 A     b     alpha     3       0.188 
    4 A     b     beta      4       0.2   
    5 B     a     alpha     5       0.312 
    6 B     a     beta      6       0.3   
    7 B     b     alpha     7       0.438 
    8 B     b     beta      8       0.4