Suppose my data looks like this
student article.bought
1 A pen
2 B pencil
3 V book
4 A pen
5 A inkbottle
6 B pen
7 B pencil
8 B pencil
9 V book
10 Z marker
11 A inkbottle
12 V book
13 V pen
14 V book
I need unique occurrences of articles probably in a different column like this
student article.bought Occurences
1 A pen 1
2 B pencil 1
3 V book 1
4 A pen 1 # as A is taking a pen again
5 A inkbottle 2 # 'A' changed from pen to ink bottle
6 B pen 2
7 B pencil 3 # though B took pencil before, this is different as he took a pen in between
8 B pencil 3
9 V book 1
10 Z marker 1
11 A inkbottle 2
12 V book 1
13 V pen 2
14 V book 3
In R, we can find changes in a student's selection by finding the difference, diff
, of each subsequent value. When we take the cumulative sum, cumsum
, of that logical index we get a running count of occurrences.
In the second line we coerce the factor variable article.bought
to numeric and run the function from the first line using ave
to group the function f
by student.
f <- function(x) cumsum(c(F, diff(x) != 0)) + 1
df$Occurences <- with(df, ave(as.numeric(article.bought), student, FUN=f))
df
# student article.bought Occurences
# 1 A pen 1
# 2 B pencil 1
# 3 V book 1
# 4 A pen 1
# 5 A inkbottle 2
# 6 B pen 2
# 7 B pencil 3
# 8 B pencil 3
# 9 V book 1
# 10 Z marker 1
# 11 A inkbottle 2
# 12 V book 1
# 13 V pen 2
# 14 V book 3