I have a data frame with 100 records including bmi class (over or below 30), waist circumference class (over or below threshold) and outcome variable (deceased 0 or 1).
set.seed(1)
data <-
tibble(bmiclass=sample(x=c(0,1), size=100, replace = TRUE),
wcclass=sample(x=c(0,1), size=100, replace = TRUE),
deceased=sample(x=c(0,1), size=100, replace = TRUE))
I'd need to get two information in the same table: 1) percentage of subjects in the higher WC class by BMI group and 2) the risk of death by BMI group and WC class. I managed to do this by joining two dplyr::group_by and dplyr::summarise by left_join function as follows:
data %>% group_by(bmiclass, wcclass) %>% dplyr::summarise(risk.death=sum(deceased)/n()*100) %>%
left_join(data %>% group_by(bmiclass) %>% dplyr::summarise(risk.wc=sum(wcclass)/n()*100), by="bmiclass")
BUT i'm wondering if there is a more straightforward way to do it simpler without left_join?
This will equivalently do the same thing
data %>%
group_by(bmiclass) %>%
mutate(risk.wc = sum(wcclass)/n()*100) %>%
group_by(bmiclass, wcclass, risk.wc) %>% summarise(risk.death=sum(deceased)/n()*100)
# A tibble: 4 x 4
# Groups: bmiclass, wcclass [4]
bmiclass wcclass risk.wc risk.death
<dbl> <dbl> <dbl> <dbl>
1 0 0 49.0 52
2 0 1 49.0 50
3 1 0 45.1 64.3
4 1 1 45.1 56.5
Check it with your code
> data %>% group_by(bmiclass, wcclass) %>% dplyr::summarise(risk.death=sum(deceased)/n()*100) %>%
+ left_join(data %>% group_by(bmiclass) %>% dplyr::summarise(risk.wc=sum(wcclass)/n()*100), by="bmiclass")
`summarise()` has grouped output by 'bmiclass'. You can override using the `.groups` argument.
# A tibble: 4 x 4
# Groups: bmiclass [2]
bmiclass wcclass risk.death risk.wc
<dbl> <dbl> <dbl> <dbl>
1 0 0 52 49.0
2 0 1 50 49.0
3 1 0 64.3 45.1
4 1 1 56.5 45.1