I wanted to tidy the following data frame.
discipline applications_total applications_men applications_women awards_total awards_men awards_women
1 Chemical sciences 122 83 39 32 22 10
2 Physical sciences 174 135 39 35 26 9
3 Physics 76 67 9 20 18 2
4 Humanities 396 230 166 65 33 32
5 Technical sciences 251 189 62 43 30 13
6 Interdisciplinary 183 105 78 29 12 17
7 Earth/life sciences 282 156 126 56 38 18
8 Social sciences 834 425 409 112 65 47
9 Medical sciences 505 245 260 75 46 29
success_rates_total success_rates_men success_rates_women
1 26.2 26.5 25.6
2 20.1 19.3 23.1
3 26.3 26.9 22.2
4 16.4 14.3 19.3
5 17.1 15.9 21.0
6 15.8 11.4 21.8
7 19.9 24.4 14.3
8 13.4 15.3 11.5
9 14.9 18.8 11.2
I come up with the following scripts (after several hours researching answers here)
library(dplyr)
library(tidyr)
library(stringr)
library(dslabs)
data("research_funding_rates")
research_funding_rates
#using gather
dat <- research_funding_rates %>%
select(-applications_total,-awards_total, -success_rates_total) %>%
gather(gender1, rate, starts_with("success_rates_")) %>%
mutate(gender1 = str_sub(gender1,15)) %>%
gather(gender2, applications, starts_with("applications_")) %>%
mutate(gender2 = str_sub(gender2,14)) %>%
gather(gender, awards, starts_with("awards_")) %>%
mutate(gender = str_sub(gender,8)) %>%
filter(gender1 == gender2 & gender1 == gender) %>%
select(discipline, gender, applications, awards, rate)
#using pivot_longer
research_funding_rates %>%
select(-applications_total,-awards_total, -success_rates_total) %>%
pivot_longer(-discipline, names_to = "Name", values_to = "Count") %>%
mutate(gender = ifelse(row_number() %% 2 == 1, "men","women") ) %>%
mutate(Name = str_remove_all(Name,c("_men","_women"))) %>%
group_by(Name) %>%
spread(Name, Count, convert = TRUE) %>%
arrange(gender)
I am trying to figure out if there is a more efficient way to do this task.
I found a more efficient script using gather
.
research_funding_rates %>%
select(-applications_total,-awards_total, -success_rates_total) %>%
rename( rate_men = success_rates_men, rate_women = success_rates_women) %>%
gather(key, value, -discipline) %>% #-discipline have to come last
separate(key, c("type", "gender")) %>%
spread(type, value)
separate()
"knows" which string character to use. Using rename
and separate
should streamline the pivot_longer
solution.