Search code examples
rdplyrunique

Using unique to obtain a vector of UserId's that satisfy a criterion without creating an intermediate dataframe


I have a data frame with time series data for multiple subjects, and want to extract a vector of unique UserIds that meet a criterion for the number of days on which the subject answered a set of questions

    > df_all_rows
                                 UserId Answer_Date Q1_Daily Q2_Daily Q3_Daily Q4_Daily Q5_Daily
1  1f3edec4-38c9-44f3-9931-942ccba98203 2017-01-26        7        8        8        8        5        
2  202e6c2f-0b78-4ae2-9b60-4116a7241199 2017-03-11        6        4        5        3        6        
3  23124514-338b-46cf-8fa8-f4fea09f3d87 2017-04-05        3        3        4        1        3        
4  23124514-338b-46cf-8fa8-f4fea09f3d87 2017-04-06        3        3        4        1        2        
5  23124514-338b-46cf-8fa8-f4fea09f3d87 2017-04-07        3        3        4        1        2        
6  23124514-338b-46cf-8fa8-f4fea09f3d87 2017-04-08        3        3        2        2        1        
7  23124514-338b-46cf-8fa8-f4fea09f3d87 2017-04-09        3        3        4        1        2        
8  23124514-338b-46cf-8fa8-f4fea09f3d87 2017-04-10        3        3        4        2        2        
9  2354d580-4065-404a-8a3e-154dc83900d3 2017-04-21        9        9        8        8        9        
10 4ab5911d-767f-47db-b937-f1b2f3735ff7 2017-07-27        5        3        2        0        1        
11 59eeda84-53cc-47fd-b2b0-23bfaa6cbde7 2017-04-04        3        2        1        5        2        
12 59eeda84-53cc-47fd-b2b0-23bfaa6cbde7 2017-04-05        3        2        1        5        5        

I tried the following code snippet:

subjects <- df_all_rows %>% 
              group_by(UserId) %>%
              filter(n() >= n_required) %>% 
              unique(UserId)

Unfortunately the last step does not work: I get the following error message:

Error in isFALSE(incomparables) : object 'UserId' not found

It's easy enough to create a new data frame (call it df_tmp) using the first three lines of this segment and then writing

subjects <- unique(df_tmp[["UserId"]])

but surely there must be a way to do this in one step instead of two without creating another dataframe?

I'd be very appreciative for guidance with my problem.

Sincerely

Thomas Philips


Solution

  • Error in isFALSE(incomparables) : object 'UserId' not found
    

    This error means that unique is actually looking for a variable with the name UserId. Of course, this does not exist.

    unique

    In case you want to use unique, the following should work according to this post (Using the pipe in unique() function in r is not working).

    subjects <- df_all_rows %>% 
                  group_by(UserId) %>%
                  filter(n() >= n_required) %>% 
                  {unique(.$UserId)}
    

    with pure tidy

    As pointed out below by user MrFlick in the comments below, you can also use the following in case you want a pure tidy version.

    subjects <- df_all_rows %>% 
                  group_by(UserId) %>%
                  filter(n() >= n_required) %>% 
                  pull(UserId) %>% unique()
    

    or you could try using distinct. See also here.

    HTH!