I have a data frame with time series data for multiple subjects, and want to extract a vector of unique UserIds that meet a criterion for the number of days on which the subject answered a set of questions
> df_all_rows
UserId Answer_Date Q1_Daily Q2_Daily Q3_Daily Q4_Daily Q5_Daily
1 1f3edec4-38c9-44f3-9931-942ccba98203 2017-01-26 7 8 8 8 5
2 202e6c2f-0b78-4ae2-9b60-4116a7241199 2017-03-11 6 4 5 3 6
3 23124514-338b-46cf-8fa8-f4fea09f3d87 2017-04-05 3 3 4 1 3
4 23124514-338b-46cf-8fa8-f4fea09f3d87 2017-04-06 3 3 4 1 2
5 23124514-338b-46cf-8fa8-f4fea09f3d87 2017-04-07 3 3 4 1 2
6 23124514-338b-46cf-8fa8-f4fea09f3d87 2017-04-08 3 3 2 2 1
7 23124514-338b-46cf-8fa8-f4fea09f3d87 2017-04-09 3 3 4 1 2
8 23124514-338b-46cf-8fa8-f4fea09f3d87 2017-04-10 3 3 4 2 2
9 2354d580-4065-404a-8a3e-154dc83900d3 2017-04-21 9 9 8 8 9
10 4ab5911d-767f-47db-b937-f1b2f3735ff7 2017-07-27 5 3 2 0 1
11 59eeda84-53cc-47fd-b2b0-23bfaa6cbde7 2017-04-04 3 2 1 5 2
12 59eeda84-53cc-47fd-b2b0-23bfaa6cbde7 2017-04-05 3 2 1 5 5
I tried the following code snippet:
subjects <- df_all_rows %>%
group_by(UserId) %>%
filter(n() >= n_required) %>%
unique(UserId)
Unfortunately the last step does not work: I get the following error message:
Error in isFALSE(incomparables) : object 'UserId' not found
It's easy enough to create a new data frame (call it df_tmp) using the first three lines of this segment and then writing
subjects <- unique(df_tmp[["UserId"]])
but surely there must be a way to do this in one step instead of two without creating another dataframe?
I'd be very appreciative for guidance with my problem.
Sincerely
Thomas Philips
Error in isFALSE(incomparables) : object 'UserId' not found
This error means that unique is actually looking for a variable with the name UserId
. Of course, this does not exist.
In case you want to use unique
, the following should work according to this post (Using the pipe in unique() function in r is not working).
subjects <- df_all_rows %>%
group_by(UserId) %>%
filter(n() >= n_required) %>%
{unique(.$UserId)}
As pointed out below by user MrFlick in the comments below, you can also use the following in case you want a pure tidy version.
subjects <- df_all_rows %>%
group_by(UserId) %>%
filter(n() >= n_required) %>%
pull(UserId) %>% unique()
or you could try using distinct. See also here.
HTH!