Search code examples
rreshape2

Formatting to count data


I have the following data.frame:

df = data.frame(a = sample(c(rep(1,23),rep(2,22), rep(3,43), rep(4, 12))), 
                b = sample(c(rep(1,10),rep(2,10), rep(3,20), rep(4, 60))), 
                c = sample(c(rep(1,40),rep(2,5), rep(3,30), rep(4, 25))))

table(df)

I'd like to run a model on these counts. A model of the following kind:

MCMCglmm(fixed = MyCount ~ a+b , random = c, data=new.df)

My question has to do with how to easily go from df to the new.df (data.frame which contains data expressed the right way). Or how to express 4 variables out of the three first in order to have a count variable of their interaction.

The variable fixed might be define doing MyCount = c(table(df)). But re-expressing a, b and c seem rather complicated to me.

What is the simplest solution? Maybe using the package reshape?


Solution

  • The as.data.frame.table will construct a "Freq" column which I am renaming to 'MyCount';

    > new.df <- setNames( as.data.frame(table(df)), c(names(df), "MyCount"))
    > str(new.df)
    'data.frame':   64 obs. of  4 variables:
     $ a      : Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ...
     $ b      : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 2 2 2 2 3 3 ...
     $ c      : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
     $ MyCount: int  1 2 0 0 1 0 2 2 0 3 ...
    

    BTW, there is no package named "Reshape". Correct capitalization is part of learning R.