Search code examples
rdplyrtidyrplyr

Cant use pivot wider after grouping with count


Today, I used the count() with group_by() and summarise() from the dplyr package. Here is the code:

dplyr::group_by(QUARTER, PROVINCE, DATE) %>%
  dplyr::summarise(COUNT = count(EDUCATION)) %>%
ungroup()

Here is what my df looks like after executing the code above:

Quarter    PROVINCE      DATE        COUNT.x        COUNT.freq

2020Q1   New-York       2020-01-07     1              30
2021Q1   New-York       2020-01-07     2              21
2020Q1   New-York       2020-01-07     3              55
2021Q1   New-York       2020-01-07     4              8

Note: I did not create COUNT.x and COUNT.freq myself! Now, I am trying to take the COUNT.x as columns and COUNT.freq as the values. This means pivot_wider() would be the solution here. However, after trying the following:

pivot_wider(df, names_from = COUNT.x, values_from = COUNT.freq)

It says COUNT.x and COUNT.freq do not exist:

Error: Can't subset columns that don't exist.
x Column `COUNT.x` doesn't exist

Also, checking names(df) gives me:

[1] "QUARTER"  "PROVINCE" "DATE" "COUNT"  

It does not include COUNT.x and COUNT.freq!

Anyone have a solution to this problem?


Solution

  • The issue is that count is being applied within summarise resulting in a data.frame/tibble column. It should be outside i.e.

    dplyr::count(df, QUARTER, PROVINCE, DATE, EDUCATION)
    

    If we are using the OP's code, it is not a regular data.frame column. We need to reinvoke data.frame to flatten the columns

    out <- df %>% dplyr::group_by(QUARTER, PROVINCE, DATE) %>%
      dplyr::summarise(COUNT = count(EDUCATION)) %>%
    ungroup()
    out1 <- do.call(data.frame, out)