Suppose there is dataframe df
age category
12 A
15 B
12 A
13 C
14 B
14 D
I want to use aggregate to find the number of occurrences in each category {A, B, C, D} for each age. The number of A, B, C and D respectively are established as columns so the output data frame should look like
age A B C D
12 2 0 0 0
13 0 0 1 0
14 0 1 0 1
15 0 1 0 0
Attempt
agdf <- aggregate(df, by=list(df$age, df$category), FUN=length)
But doing this only gives me
age category x
12 A 2
15 B 1
14 B 1
13 C 1
14 D 1
Another problem is that the original df
has other columns but they have been omitted for simplicity. Yet with this aggregate approach using FUN=length
, it turns all those other columns into the same count value as x
. How can I keep those values?
E.g.
age category x y z
12 A 2 2 2
15 B 1 1 1
14 B 1 1 1
13 C 1 1 1
14 D 1 1 1
but I want y and z to keep their original values, only need 1 count column x
How to massage it to the desired structure?
xtabs(~., df1)
category
age A B C D
12 2 0 0 0
13 0 0 1 0
14 0 1 0 1
15 0 1 0 0
table(df1)
category
age A B C D
12 2 0 0 0
13 0 0 1 0
14 0 1 0 1
15 0 1 0 0
reshape2::dcast(df1, age~category)
age A B C D
1 12 2 0 0 0
2 13 0 0 1 0
3 14 0 1 0 1
4 15 0 1 0 0
pivot_wider(df1, id_cols = age, names_from = category,
values_from = category, values_fn = length, values_fill = 0)
# A tibble: 4 × 5
age A B C D
<int> <int> <int> <int> <int>
1 12 2 0 0 0
2 15 0 1 0 0
3 13 0 0 1 0
4 14 0 1 0 1