This is a follow up question to Kikapp's answer.
I want to remove participant IDs which lack all the time-points. In other way around, I want to select rows which have all the four time (11, 21, 31, 41). See the sample data dropbox link
Here is my try based on Kikapp's answer. For some reason, it doesn't work. Let me know how to make it better.
data2 <- df[df$ID %in% names(table(df$ID))[table(df$ID) > 3],]
I get 4695 rows or objects or IDs for time == 11
, time == 21
,time == 41
while 4693 for time == 31
; however, I want they should be equal.
You can use dplyr
for this task for a much faster result:
df1 <- df %>% group_by(ID) %>%
dplyr::filter(length(unique(time)) == 4) %>%
data.frame()
However, when you get the counts of ID
's for each time
value you will find out that there are 32
's hidden there (2 rows in total):
df1 %>% group_by(time) %>%
dplyr::summarise(Counts = n()) %>%
data.frame()
#Output:
time Counts
11 4695
21 4695
31 4693
32 2
41 4695
This shows that you have 2 rows with values of 32
. As it turns out, that was due to a typo on your part. So you can change them with df$time[df$time == 32] <- 31
and run the code again.
I hope this was helpful.
Thanks!