I have a data frame like this:
df <- data.frame(value = c("a","b","b","d","a","b","b","d","a","b","c","d"),
pattern = c("NA","a","ab","abb","bbd","bda","dab","abb","bbd","bda","dab","abc"))
The value column indicates the actual behaviour, and the pattern shows the cummulative behaviour before this action happens. Now I want to compare the patterns with the 4 patterns above and count the number of appearances, plus the number of appearance of the belonging letter in the "value"-column, to calculate the expected result.
The result should look like this:
value pattern apperance a b c d exp.result
1 a NA 0 0 0 0 0 <NA>
2 b a 0 0 0 0 0 <NA>
3 b ab 0 0 0 0 0 <NA>
4 d abb 0 0 0 0 0 <NA>
5 a bbd 0 0 0 0 0 <NA>
6 b bda 0 0 0 0 0 <NA>
7 b dab 0 0 0 0 0 <NA>
8 d abb 1 0 0 0 1 d
9 a bbd 1 1 0 0 0 a
10 b bda 1 0 1 0 0 b
11 c dab 1 0 1 0 0 b
12 d abc 0 0 0 0 0 <NA>
I hope somebody can help me with this problem.
You can use this approach :
df <- data.frame(
value = c("a","b","b","d","a","b","b","d","a","b","c","d"),
pattern = c(NA,"a","ab","abb","bbd","bda","dab","abb","bbd","bda","dab","abc"))
win <- 4
analyzeWindow <- function(idx){
idxs <- max(1,idx-win):(idx-1)
if(idx == 1) idxs <- integer()
winDF <- df[idxs,]
winDF <- winDF[na.omit(winDF$pattern == df$pattern[idx]),]
expValWeights <- unlist(as.list(table(winDF$value)))
c(appearances=nrow(winDF),expValWeights)
}
newCols <- t(sapply(1:nrow(df),analyzeWindow))
df2 <- cbind(df,newCols)
df2$exp.result <- colnames(newCols)[-1][max.col(newCols[,-1],ties.method='first')]
df2$exp.result[rowSums(newCols[,-1]) == 0] <- NA
> df2
value pattern appearances a b c d exp.result
1 a <NA> 0 0 0 0 0 <NA>
2 b a 0 0 0 0 0 <NA>
3 b ab 0 0 0 0 0 <NA>
4 d abb 0 0 0 0 0 <NA>
5 a bbd 0 0 0 0 0 <NA>
6 b bda 0 0 0 0 0 <NA>
7 b dab 0 0 0 0 0 <NA>
8 d abb 1 0 0 0 1 d
9 a bbd 1 1 0 0 0 a
10 b bda 1 0 1 0 0 b
11 c dab 1 0 1 0 0 b
12 d abc 0 0 0 0 0 <NA>
NOTE:
This code requires the "value" column being of type factor. Use as.factor
if it isn't.