Search code examples
rdata-manipulationdata-wrangling

Creating a table using two columns with multiple values in R


I have a sample dataset in R as follows:

ID <- c(1:10)
State <- c("AL", "AL", "AL", "TX", "TX", "TX", "TX", "TX", "WY", "WY")
QCT <- c(1, 2, 1, 2, 2, 2, 1, 1, 2, NA)

df <- data.frame(ID, State, QCT)

print(df)

   ID State QCT
1   1    AL   1
2   2    AL   2
3   3    AL   1
4   4    TX   2
5   5    TX   2
6   6    TX   2
7   7    TX   1
8   8    TX   1
9   9    WY   2
10 10    WY  NA

I want an output like the below where the rows denote the State column and the other columns are generated based on the aggregated unique values in the QCT column.

State 1 2 NA
AL    2 1 0
TX    2 3 0
WY    0 1 1

I have so far tried group-by() and summarise, also tried using the pivot_wider function from tidyr and tried the options on this thread Count number of rows per group and add result to original data frame, however, I am unable to get the desired result. This is a hypothetical dataframe, and I have a large data set of 490000 observations, where there are 38 unique values for the State column and 3 unique values (including NA which unfortunately I would need to keep as is) for the QCT column.

Thanks for your time and help!


Solution

  • In dplyr you can do.

    I added another column called NA, then I sum all the columns in the summarise function.

    library(dplyr)
    
    df |> 
      group_by(State) |> 
      mutate("NA" = is.na(QCT)) |> 
      summarise(`1` = sum(QCT == 1, na.rm = TRUE), 
                `2` = sum(QCT == 2, na.rm = TRUE),
                `NA` = sum(is.na(QCT), na.rm = TRUE), .groups = "drop")
    
    
      State   `1`   `2`  `NA`
      <chr> <int> <int> <int>
    1 AL        2     1     0
    2 TX        2     3     0
    3 WY        0     1     1