I am trying to create a count of the number of observations and extract percentages from those counts.
I have data that looks like this:
UserID substance_use
43124 0
43124 1
43124 0
43124 0
43124 1
215 1
215 1
I want a count of the total observations of substance use like this:
UserID substance_use count
43124 0 5
43124 1 5
43124 0 5
43124 0 5
43124 1 5
215 1 2
215 1 2
And then, I want the % of visits that participants said yes and no, like this:
UserID substance_use count percent_yes percent_no
43124 0 5 40% 60%
43124 1 5 40% 60%
43124 0 5 40% 60%
43124 0 5 40% 60%
43124 1 5 40% 0%
215 1 2 100% 0
215 1 2 100% 0%
I tried using the count function for the first part, but it's not taking into account the 0's. Any help would be greatly appreciated.
We may use add_count
to create the 'count' column and then get the mean
of the binary column by group to return the percent_yes
and subtract 100 from the percent_yes to return percent_no
library(dplyr)
df1 %>%
add_count(UserID, name = 'count') %>%
group_by(UserID) %>%
mutate(percent_yes = 100 * mean(substance_use),
percent_no = 100 - percent_yes) %>%
ungroup
-output
# A tibble: 7 × 5
UserID substance_use count percent_yes percent_no
<int> <int> <int> <dbl> <dbl>
1 43124 0 5 40 60
2 43124 1 5 40 60
3 43124 0 5 40 60
4 43124 0 5 40 60
5 43124 1 5 40 60
6 215 1 2 100 0
7 215 1 2 100 0
NOTE: Here, we assumed no missing values in 'substance_use' column
df1 <- structure(list(UserID = c(43124L, 43124L, 43124L, 43124L, 43124L,
215L, 215L), substance_use = c(0L, 1L, 0L, 0L, 1L, 1L, 1L)),
class = "data.frame", row.names = c(NA,
-7L))