I have a grouped data and I would like to create a new variable based on the values of the each of the rows.
> df <- data.frame(Group = c("A","A","A","B","B","B", "C", "C"), Gender=c("M","M","F","F","F","F", "M", "M"))
> df
Group Gender
1 A M
2 A M
3 A F
4 B F
5 B F
6 B F
7 C M
8 C M
In this example, I would like to now if groups of A, B and C are
So the desired outcome is:
Group Gender gender_mix
1 A M Mixed
2 A M Mixed
3 A F Mixed
4 B F Female Only
5 B F Female Only
6 B F Female Only
7 C M Male Only
8 C M Male Only
I tried using any() and all() with no luck:
> df%>%
+ group_by(Group)%>%
+ mutate(gender_mix=case_when(all(Gender)=="M"~"Male Only",
+ all(Gender)=="F"~"FemAle Only",
+ any(Gender)=="M"&any(Gender)=="F"~"Mixed",
+ TRUE~NA_character_))
# A tibble: 8 × 3
# Groups: Group [3]
Group Gender gender_mix
<chr> <chr> <chr>
1 A M NA
2 A M NA
3 A F NA
4 B F NA
5 B F NA
6 B F NA
7 C M NA
8 C M NA
There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
2: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
3: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
4: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 1: Group = "A".
5: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
6: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
7: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
8: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 2: Group = "B".
9: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
10: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
11: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
12: Problem while computing `gender_mix = case_when(...)`.
ℹ coercing argument of type 'character' to logical
ℹ The warning occurred in group 3: Group = "C".
Another issues is that my data is rather large (10M rows) and any() and all() seems to be very slow.
Any help would be much appreciated.
You should put the entire conditions between parenthesis:
df %>%
group_by(Group) %>%
mutate(gender_mix = case_when(all(Gender == "M") ~ "Male only",
all(Gender == "F") ~ "Female only",
any(Gender == "F") & any(Gender == "M") ~ "Mixed",
TRUE ~ NA_character_))