I'm trying to recode a variable that was not coded correctly upon data entry. It's (seemingly) tricky, and I could use a bit of direction.
The data frame (in long format) has three columns: s_id (participant identifier); i_id (item identifier); score (binary--0/1--correct/incorrect). It's score that needs to be recoded.
This assessment was such that for each participant, items were administered until 6 consecutive items were answered incorrectly (call the 6th incorrect item the basal item). At this point, 14 additional items were administered, and all remaining items after the 14 should have been coded as missing. The problem is that all items after the 14 were coded with zeros, which makes analysis difficult.
I need a new variable n_score, created by cycling through each participant's original scores, looking for the first instance of six consecutive 0's, count 14 more after that. These scores are simply placed in n_score, but what comes after for each participant should be recoded NA
.
I'm in a for-loop hell, and could use some help--perhaps a clever way to tackle the problem. Below is a reproducible example of the data structure, with an added column (n_score), which is what the newly recoded variable should look like.
To generate data:
s_id <- rep(c(1:2), each = 25)
i_id <- rep(1:25, 2)
score <- c(1,1,0,0,0,0,0,0,1,0,1,0,0,0,1,1,0,0,1,0,0,0,1,1,1,
1,1,0,0,0,0,0,0,1,0,1,0,0,0,1,1,0,0,1,0,0,0,1,1,1)
n_score <- c(1,1,0,0,0,0,0,0,1,0,1,0,0,0,1,1,0,0,1,0,0,0,NA,NA,NA,
1,1,0,0,0,0,0,0,1,0,1,0,0,0,1,1,0,0,1,0,0,0,NA,NA,NA)
dat <- data.frame(
s_id = s_id,
i_id = i_id,
score = score,
n_score = n_score
)
This code gives the same output than in n_score:
mypattern = '000000'
recode <- function(x) {
start <- regexpr(mypattern, paste(x,collapse=''))
end <- start + 6 + 14 -1
return(c(x[1:end], rep(NA, length(x) - end)))
}
ddply(dat, .(s_id), transform, newcol=recode(score))