Search code examples
rfrequency

How to update a frequency table based on a third variable?


I have a data frame that summaries the different class sequences of three years (2006,2007,2008), the variable count indicating the number of occurrence of a class sequence:

df<-data.frame(count=c(150,33,35,26,15,65), 
           Y.2006=c("a","a","a","d","d","d"), 
           Y.2007=c("a","b","b","c","c","c"),
           Y.2008=c("a","b","a","c","c","d") )

that looks like:

  count Y.2006 Y.2007 Y.2008 
1   150      a      a      a     
2    33      a      b      b        
3    35      a      b      a        
4    26      d      c      c        
5    15      d      c      c         
6    65      d      c      d         

I compute the sequence of 2006-2007:

df$Y.2006_2007<-paste(df$Y.2006, df$Y.2007)

The result:

  count Y.2006 Y.2007 Y.2008 Y.2006_2007
1   150      a      a      a         a a
2    33      a      b      b         a b
3    35      a      b      a         a b
4    26      d      c      c         d c
5    15      d      c      c         d c
6    65      d      c      d         d c

I would like the number of occurrence of Y.2006_2007 for the different class of 2008. So I do:

table(df$Y.2006_2007, df$Y.2008)

Resulting in :

      a b c d
  a a 1 0 0 0
  a b 1 1 0 0
  d c 0 0 2 1

Now, I would like to update these results based on the count variable of the initial data frame (df) to take into account the number of occurrence of the class sequences. So I would like:

      a   b   c   d
a a 150   0   0   0
a b  35  33   0   0
d c   0   0  41  65

I can't figure out how to achieve this result. Any ideas?


Solution

  • You can use xtabs to do that with base R:

    xtabs(formula = count ~ Y.2006_2007 + Y.2008, data = df)
    
                  Y.2008
      Y.2006_2007   a   b   c   d
              a a 150   0   0   0
              a b  35  33   0   0
              d c   0   0  41  65
    

    And in case you need a data.frame, you can just wrap it in as.data.frame:

    as.data.frame(xtabs(formula = count ~ Y.2006_2007 + Y.2008, data = df))
       Y.2006_2007 Y.2008 Freq
    1          a a      a  150
    2          a b      a   35
    3          d c      a    0
    4          a a      b    0
    5          a b      b   33
    6          d c      b    0
    7          a a      c    0
    8          a b      c    0
    9          d c      c   41
    10         a a      d    0
    11         a b      d    0
    12         d c      d   65