Search code examples
rdataframedplyrtidyverse

Using dplyr summarize and group_by functions based on "or" condition among multiple columns of a dataframe


I can measure the proportion of 1 instances in my x column in below df:

df <- tribble(
  ~id, ~x, 
  'A',0,
  'A',1,
  'A',1,
  'B',0,
  'B',0,
  'B',1,
  'C',1,
  'C',0,
  'C',0,
  'C',1,
)


df %>%
  group_by(id) %>%
  summarise(
    result = round(prop.table(table(x))["1"]*100, digits = 2)
    )

# A tibble: 3 x 2
  id    result
  <chr>  <dbl>
1 A       66.7
2 B       33.3
3 C       50 

But what I really want is to measure the proportion of instances of 1 if any of the x, y and z column have the value of 1. For instance, in below case for category A, row one has zero value since all x, y and z values are zero but row two and three has the value of 1 because at least 1 appeared in one the x, y and z column. Therefore, again category A has the 66.7% of 1 values with group_by and summarize function used above (2 out of three rows of A category have value of 1)

df <- tribble(
  ~id, ~x, ~y, ~z
  'A',0,0,0,
  'A',1,0,0,
  'A',1,1,0,
  'B',0,0,0,
  'B',0,0,0,
  'B',1,0,0,
  'C',1,0,0,
  'C',0,0,0,
  'C',0,0,0,
  'C',1,0,0,
)

Solution

  • With mean + rowSums:

    df %>%
      group_by(id) %>%
      summarise(
        result = mean(rowSums(across(x:z)) >= 1)
      )
    
    # A tibble: 3 × 2
      id    result
      <chr>  <dbl>
    1 A      0.667
    2 B      0.333
    3 C      0.5