I frequently receive data from REDCap surveys in which respondents are permitted to "Check" >1 response to a survey question. Each potential response is housed in its own column. I'd like to summarize the frequency with which each response option (column) has been checked. For example:
library(tidyverse)
set.seed(1234)
responses<-c("Checked", "Unchecked")
numobs<-10
my_example<-data.frame(id=1:10,
Response_Option_A=sample(responses, numobs, replace=TRUE),
Response_Option_B=sample(responses, numobs, replace=TRUE),
Response_Option_C=sample(responses, numobs, replace=TRUE),
Response_Option_D=sample(responses, numobs, replace=TRUE),
stringsAsFactors = FALSE)
my_example
#> id Response_Option_A Response_Option_B Response_Option_C Response_Option_D
#> 1 1 Unchecked Unchecked Unchecked Checked
#> 2 2 Unchecked Unchecked Unchecked Unchecked
#> 3 3 Unchecked Unchecked Unchecked Checked
#> 4 4 Unchecked Checked Unchecked Checked
#> 5 5 Checked Unchecked Unchecked Checked
#> 6 6 Unchecked Unchecked Unchecked Unchecked
#> 7 7 Checked Unchecked Checked Checked
#> 8 8 Checked Checked Unchecked Unchecked
#> 9 9 Checked Unchecked Unchecked Unchecked
#> 10 10 Unchecked Unchecked Unchecked Checked
My initial inclination was to try this, but it returns the total number of responses checked, rather than the number in each column.
my_example %>%
select(starts_with("Response_Option_")) %>%
summarise(checked=sum(.=="Checked"))
#> checked
#> 1 13
Created on 2020-08-10 by the reprex package (v0.3.0)
Help summarizing these responses efficiently is appreciated.
This is a tidyverse
approach to show the response totals per column, not by row. I assume, by how you worded your question, that this is what you are looking for. Also included is the starts_with()
function which was included in your question tags.
We can use pivot_longer()
to pivot the response features from wide to long then use group_by
to define the variables taking the existing table and converting it to a grouped table where the summarise(
) operation is used to create a new data frame with rows provided for each combination of grouping variables.
library(tidyverse)
set.seed(1234)
responses<-c("Checked", "Unchecked")
numobs<-10
my_example<-data.frame(id=1:10,
Response_Option_A=sample(responses, numobs, replace=TRUE),
Response_Option_B=sample(responses, numobs, replace=TRUE),
Response_Option_C=sample(responses, numobs, replace=TRUE),
Response_Option_D=sample(responses, numobs, replace=TRUE),
stringsAsFactors = FALSE)
my_example %>%
pivot_longer(starts_with("Response_"), names_to = "Responses",
values_to = "value") %>%
group_by(Responses, value) %>%
summarise(total_responses = n())
#> # A tibble: 8 x 3
#> # Groups: Responses [4]
#> Responses value total_responses
#> <chr> <chr> <int>
#> 1 Response_Option_A Checked 4
#> 2 Response_Option_A Unchecked 6
#> 3 Response_Option_B Checked 2
#> 4 Response_Option_B Unchecked 8
#> 5 Response_Option_C Checked 1
#> 6 Response_Option_C Unchecked 9
#> 7 Response_Option_D Checked 6
#> 8 Response_Option_D Unchecked 4
Created on 2020-08-10 by the reprex package (v0.3.0)
If you want just the Checked
responses, you can add the following line of code after the summarise()
operation:
filter(value == "Checked")
#> # A tibble: 4 x 3
#> # Groups: Responses [4]
#> Responses value total_responses
#> <chr> <chr> <int>
#> 1 Response_Option_A Checked 4
#> 2 Response_Option_B Checked 2
#> 3 Response_Option_C Checked 1
#> 4 Response_Option_D Checked 6