I want create to dummy variables while aggregating a dataframe in R.
dat <- read.table(textConnection('ID Score Info
1 1 A 1
2 1 A 10
3 1 B 7
4 2 C 8
5 2 B 9
6 2 B 1
7 3 B 7
8 3 C 8
9 3 C 3
10 3 A 2'))
Basically, I want to aggregate by "ID" and count no of rows with same ID which is easy but I also need to create dummy variables based on the column "Row". If A/B/C is present within the same ID, the dummy variables will have value of 1, 0 otherwise. For example, There is no A for ID no 2 therefore Score_A for ID no 2 is 0 but 1 for Score_B and Score_C.
Output would be as follows:
ID Count Score_A Score_B Score_C
1 1 3 1 1 0
2 2 3 0 1 1
3 3 4 1 1 1
Any help would be much appreciated.
We create a frequency column 'Count' after grouping by 'ID', then add the second grouping with 'Score', summarise
with the first value of 'Count' and a column of 1, and spread
to 'wide' format
dat %>%
group_by(ID) %>%
mutate(Count = n()) %>%
group_by(Score = paste0("Score_", Score), add = TRUE) %>%
summarise(Count = first(Count), n1 = 1) %>%
spread(Score, n1, fill = 0)
# A tibble: 3 x 5
# Groups: ID [3]
# ID Count Score_A Score_B Score_C
#* <int> <int> <dbl> <dbl> <dbl>
#1 1 3 1.00 1.00 0
#2 2 3 0 1.00 1.00
#3 3 4 1.00 1.00 1.00