Summarise but keep length variable (dplyr)

Basic dplyr question... Respondents could select multiple companies that they use. For example:

library(dplyr)
test <- tibble(
 CompanyA = rep(c(0:1),5),
 CompanyB = rep(c(1),10),
 CompanyC = c(1,1,1,1,0,0,1,1,1,1)
)
test

If it were a forced-choice question - i.e., respondents could make only one selection - I would do the following for a basic summary table:

test %>% 
  summarise_all(funs(sum), na.rm = TRUE) %>% 
  gather(Response, n) %>% 
  arrange(desc(n)) %>% 
  mutate("%" = round(100*n/sum(n)))

Note, however, that the "%" column is not what I want. I'm instead looking for the proportion of total respondents for each individual response option (since they could make multiple selections).

I've tried adding mutate(totalrows = nrow(.)) %>% prior to the summarise_all command. This would allow me to use that variable as the denominator in a later mutate command. However, summarise_all eliminates the "totalrows" var.

Also, if there's a better way to do this, I'm open to ideas.

Solution

To get the proportion of respondents who chose an option when that variable is binary, you can take the mean. To do this with your test data, you can use sapply:

sapply(test, mean)
CompanyA CompanyB CompanyC 
     0.5      1.0      0.8

If you wanted to do this in a more complicated fashion (say your data is not binary encoded, but is stored as 1 and 2 instead), you could do that with the following:

test %>% 
    gather(key='Company') %>% 
    group_by(Company) %>% 
    summarise(proportion = sum(value == 1) / n())

# A tibble: 3 x 2
  Company  proportion
  <chr>         <dbl>
1 CompanyA        0.5
2 CompanyB        1  
3 CompanyC        0.8