I have a data frame that summaries the different class sequences of three years (2006,2007,2008), the variable count indicating the number of occurrence of a class sequence:
df<-data.frame(count=c(150,33,35,26,15,65),
Y.2006=c("a","a","a","d","d","d"),
Y.2007=c("a","b","b","c","c","c"),
Y.2008=c("a","b","a","c","c","d") )
that looks like:
count Y.2006 Y.2007 Y.2008
1 150 a a a
2 33 a b b
3 35 a b a
4 26 d c c
5 15 d c c
6 65 d c d
I compute the sequence of 2006-2007:
df$Y.2006_2007<-paste(df$Y.2006, df$Y.2007)
The result:
count Y.2006 Y.2007 Y.2008 Y.2006_2007
1 150 a a a a a
2 33 a b b a b
3 35 a b a a b
4 26 d c c d c
5 15 d c c d c
6 65 d c d d c
I would like the number of occurrence of Y.2006_2007 for the different class of 2008. So I do:
table(df$Y.2006_2007, df$Y.2008)
Resulting in :
a b c d
a a 1 0 0 0
a b 1 1 0 0
d c 0 0 2 1
Now, I would like to update these results based on the count variable of the initial data frame (df) to take into account the number of occurrence of the class sequences. So I would like:
a b c d
a a 150 0 0 0
a b 35 33 0 0
d c 0 0 41 65
I can't figure out how to achieve this result. Any ideas?
You can use xtabs
to do that with base R:
xtabs(formula = count ~ Y.2006_2007 + Y.2008, data = df)
Y.2008
Y.2006_2007 a b c d
a a 150 0 0 0
a b 35 33 0 0
d c 0 0 41 65
And in case you need a data.frame, you can just wrap it in as.data.frame
:
as.data.frame(xtabs(formula = count ~ Y.2006_2007 + Y.2008, data = df))
Y.2006_2007 Y.2008 Freq
1 a a a 150
2 a b a 35
3 d c a 0
4 a a b 0
5 a b b 33
6 d c b 0
7 a a c 0
8 a b c 0
9 d c c 41
10 a a d 0
11 a b d 0
12 d c d 65