Search code examples
rdataframedplyraggregatedummy-variable

create dummy variables while aggregating in R


I want create to dummy variables while aggregating a dataframe in R.

dat <- read.table(textConnection('ID Score Info
1     1     A    1
2     1     A    10
3     1     B    7
4     2     C    8
5     2     B    9
6     2     B    1
7     3     B    7
8     3     C    8
9     3     C    3
10     3     A    2'))

Basically, I want to aggregate by "ID" and count no of rows with same ID which is easy but I also need to create dummy variables based on the column "Row". If A/B/C is present within the same ID, the dummy variables will have value of 1, 0 otherwise. For example, There is no A for ID no 2 therefore Score_A for ID no 2 is 0 but 1 for Score_B and Score_C.

Output would be as follows:

      ID  Count Score_A Score_B Score_C
1     1     3      1      1       0
2     2     3      0      1       1
3     3     4      1      1       1

Any help would be much appreciated.


Solution

  • We create a frequency column 'Count' after grouping by 'ID', then add the second grouping with 'Score', summarise with the first value of 'Count' and a column of 1, and spread to 'wide' format

    dat %>% 
        group_by(ID) %>%
        mutate(Count = n()) %>%
        group_by(Score = paste0("Score_", Score), add = TRUE) %>%
        summarise(Count = first(Count), n1 = 1)  %>%
        spread(Score, n1, fill = 0)
    # A tibble: 3 x 5
    # Groups: ID [3]
    #     ID Count Score_A Score_B Score_C
    #* <int> <int>   <dbl>   <dbl>   <dbl>
    #1     1     3    1.00    1.00    0   
    #2     2     3    0       1.00    1.00
    #3     3     4    1.00    1.00    1.00