Search code examples
rdplyrcrosstab

Use R's Table function to cross-tabulate data grouped by another variable


Background

Here's a dataframe d:

d <- data.frame(ID = c("a","a","b","b"),                  
                product_code = c("B78","X31","C12","C12"),
                multiple_products = c(1,1,0,0),
                stringsAsFactors=FALSE)

The Problem & What I Want

I'm trying to make a cross-tabulation-style frequency table of multiple_products using base R's table function, but I want to do so by ID and not by row. Here's what I'm looking for:

0 1 
1 1 

In other words, a table that says "there's 1 ID where multiple_products equals 0, and 1 ID where it equals 1".

What I've Tried

Here's my attempt so far using dplyr:

dtable <- d %>%
  group_by(ID) %>%
  table(d$multiple_products) %>%
  ungroup()

This code runs on my real dataset without errors, but it gives me the same result that table(d$multiple_products) would, namely this:

0 1 
2 2 

Which indicates "2 rows where multiple_products equals 0, and 2 rows where it equals 1".

In the toy example I'm giving you here, this code doesn't even run, giving me the following error:

Error: Can't combine `ID` <character> and `multiple_products` <double>.

Any thoughts?


Solution

  • We need to check n_distinct by group

    library(dplyr)
    d %>% 
        group_by(multiple_products) %>% 
        summarise(n = n_distinct(ID))
    

    -output

    # A tibble: 2 x 2
      multiple_products     n
                  <dbl> <int>
    1                 0     1
    2                 1     1