Search code examples
rfilterfrequency-table

How do I create frequency tables for subsets of a dataframe?


I was helped earlier with applying a weight while generating a frequency table. Now I have a question about the next step, but here I'm merely looking for a better solution than the clumsy work-around I have managed to come up with.

My dataframe now has another column containing country codes, and I want to generate my weighted frequency tables separately for each country.

The work-around is this:

library(descr)

country <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2)
var <- c(1, 3, 2, 1, 2, 2, 3, 3, 1, 3, NA, NA)
wght <- c(0.8, 0.9, 1.2, 1.5, 0.5, 1, 0.7, 0.9, 0.8, 1.1, 1, 0.8)
df <- cbind.data.frame(country, var, wght)

df1 <- subset(df, country == 1)
freq(df1$var, df1$wght)

df2 <- subset(df, country == 2)
freq(df2$var, df2$wght)

For this example it works just fine, but I have around thirty countries in my real data. Doing it this way is tedious, and especially, I'd be hacking my data into pieces which may come to bite me in later stages of the analysis (e.g. if I want to aggregate countries into regions and compare them). Is there a cleaner, less "invasive" way?


Solution

  • One option is to use lapply afterspliting the data by country. This approach stores the data in a list, with each element of the list representing a country (ie, in your example data, it has length 2):

    lapply(split(df, df$country), 
           function(x) descr::freq(x[,"var"], x[,"wght"]))
    
    

    Output:

    $`1`
    x[, "var"] 
          Frequency Percent
    1           2.3   38.98
    2           2.7   45.76
    3           0.9   15.25
    Total       5.9  100.00
    
    $`2`
    x[, "var"] 
          Frequency Percent Valid Percent
    1           0.8   15.09         22.86
    3           2.7   50.94         77.14
    NA's        1.8   33.96              
    Total       5.3  100.00        100.00