My dataframe is as follows in which gender=="1" refers to men and gender=="2" refers to women, Occupations go from A to U and year goes from 2010 to 2018 (I give you a small example)
Gender Occupation Year
1 A 2010
1 A 2010
2 A 2010
1 B 2010
2 B 2010
1 A 2011
2 A 2011
1 C 2011
2 C 2011
I want an output that sums the number of rows in which gender and year and occupation is distinct like you can see next:
Year | Occupation | Men | Woman
2010 | A | 2 | 1
2010 | B | 1 | 1
2011 | A | 1 | 1
2011 | C | 1 | 1
I have tried the following:
Nr_gender_occupation <- data %>%
group_by(year, occupation) %>%
summarise(
Men = aggregate(gender=="1" ~ occupation, FUN= count),
Women = aggregate(gender=="2" ~ occupation, FUN=count)
)
You can also do a count within your groups:
library(dplyr)
df %>%
group_by(Occupation, Year) %>%
summarize(Men = sum(Gender == 1),
Woman = sum(Gender == 2), .groups = "drop")
Output
Occupation Year Men Woman
<chr> <dbl> <int> <int>
1 A 2010 2 1
2 A 2011 1 1
3 B 2010 1 1
4 C 2011 1 1