What is the Base R equivalent of this dplyr group_by code?

The R4DS book has the following code block:

by_age2 <- gss_cat %>%
  filter(! %>%
  count(age, marital) %>%
  group_by(age) %>%
  mutate(prop = n / sum(n))

Is there a simple equivalent to this code in base R? The filter can be replaced with gss_cat[!$age),], but after that I run in to trouble. It's clearly a job for by, tapply, or aggregate, but I've not been able to find the right way. by(gss_2, with(gss_2, list(age, marital)), length) is a step in the right direction, but the output is awful.


  • We could use proportions on the table output after subsetting to remove the NA (complete.cases) and selecting the columns

    The data is from forcats package. So, load the package and get the data


    Use the table/proportions as mentioned above

    by_age2_base <- proportions(table(subset(gss_cat, complete.cases(age), 
           select = c(age, marital))), 1)


    head(by_age2_base, 3)
    age    No answer Never married   Separated    Divorced     Widowed     Married
      18 0.000000000   0.978021978 0.000000000 0.000000000 0.000000000 0.021978022
      19 0.000000000   0.939759036 0.000000000 0.012048193 0.004016064 0.044176707
      20 0.000000000   0.904382470 0.003984064 0.007968127 0.000000000 0.083665339

    -compare with the OP's output

    head(by_age2, 3)
    # A tibble: 3 x 4
    # Groups:   age [2]
        age marital           n   prop
      <int> <fct>         <int>  <dbl>
    1    18 Never married    89 0.978 
    2    18 Married           2 0.0220
    3    19 Never married   234 0.940 

    If we need the output in 'long' format, convert the table to data.frame with

    by_age2_base_long <- subset(, Freq > 0)

    Or another option is aggregate/ave (use R 4.1.0)

    subset(gss_cat, complete.cases(age), select = c(age, marital)) |> 
        {\(dat) aggregate(cbind(n = age) ~ age + marital, 
          data = dat, FUN = length)}() |> 
       transform(prop = ave(n, age, FUN = \(x) x/sum(x)))