I have a variable called "Item" and it has levels that are labeled as ( 111, 112, 113, 114, etc). Each item is repeated 20 times one for each subject. The items are in rows and each are linked to the dependent variable (RT). I found that some items need to be deleted (high error rates). What codes should I write in R for it to delete or exclude, for example, items (111, 114, 222, and 319) from the the data frame so I can run the analysis without these items and their RT. I have tried running the following codes with mydata, but they did not work:
Deleted <- droplevels(mydata[mydata$Item != "111, 114, 222, 319", ])
summary(Deleted)
The summary of "deleted" show still show these items
I have also tried
Deleted <- names(mydata$Item) %in% c("111", "114", "214")
newdata <- qp[!Deleted]
summary(newdata)
I get the following after summary
Error in z[[i]] : subscript out of bounds
In addition: Warning message:
In max(unlist(lapply(z, NROW))) :
no non-missing arguments to max; returning -Inf
and for levels()
levels(newdata$Item)
NULL
I feel that I am missing a something, but I cannot figure it out!
Given mydata
as follows:
set.seed(1)
mydata <- data.frame(item=rep(100:400,each=20), RT=sample(0:100,6020, replace=T))
Then the following all produce the same thing:
to.delete <- mydata$item %in% c(111,114,222,319) # two steps
scrubbed.1 <- mydata[!to.delete,]
scrubbed.2 <- mydata[!(mydata$item %in% c(111,114,222,319)),] # same, one step
# @MatthewLundberg's approach (he left out a comma before the right bracket...]
scrubbed.3 <- droplevels(mydata[!(mydata$item %in% c(111,114,222,319)),])
identical(scrubbed.1,scrubbed.2)
# [1] TRUE
identical(scrubbed.1,scrubbed.3)
# [1] TRUE
Your first approach failed because you were comparing df$item
to a string "111, 114, 222, 319". None of the items match that string, hence no deletions.