Search code examples
rsummaryr-factor

Summarising Factors in Dataframe


My dataset is as below: http://dl.dropbox.com/u/822467/Data.csv

My situation is this. I have a series of questions (27 in all) where the response is binary in nature. 0=No, 1=Yes, 999=Missing.

My first problem is how to turn all columns into factors. I can do them one by one using as.factors but it takes forever.

My 2nd problem is that I need a summary with the Questions as headers and Yes and No as the first column and the cells are filled with the frequency of Yes and No for each question.

I would also need another dataframe with the %. Greatly appreciate any help I can have. I've looked into Hmisc's package summarize and summary and so onto no avail.


Solution

  • Four lines of code...

    dat <- read.csv("http://dl.dropbox.com/u/822467/Data.csv")
    dat[, -1] <- lapply(dat[, -1], factor, levels=c(0, 1, 999), 
        labels=c("No", "Yes", NA))
    xx <- do.call(rbind, lapply(dat[, -1], table, useNA="always"))
    cbind(xx, sum=rowSums(xx), prop.table(xx, margin=1))
    

    ... produces this result:

        No Yes <NA> sum       No      Yes     <NA>
    Q1   7  57    0  64 0.109375 0.890625 0.000000
    Q2  40  22    2  64 0.625000 0.343750 0.031250
    Q3  28  36    0  64 0.437500 0.562500 0.000000
    Q4  43  18    3  64 0.671875 0.281250 0.046875
    Q5  24  39    1  64 0.375000 0.609375 0.015625
    Q6  21  42    1  64 0.328125 0.656250 0.015625
    Q7  15  49    0  64 0.234375 0.765625 0.000000
    Q8   4  60    0  64 0.062500 0.937500 0.000000
    Q9  60   4    0  64 0.937500 0.062500 0.000000
    Q10 39  25    0  64 0.609375 0.390625 0.000000
    Q11 55   8    1  64 0.859375 0.125000 0.015625
    Q12 20  44    0  64 0.312500 0.687500 0.000000
    Q13 49  15    0  64 0.765625 0.234375 0.000000
    Q14 49  15    0  64 0.765625 0.234375 0.000000
    Q15 51  13    0  64 0.796875 0.203125 0.000000
    Q16 61   3    0  64 0.953125 0.046875 0.000000
    Q17 41  23    0  64 0.640625 0.359375 0.000000
    Q18 60   4    0  64 0.937500 0.062500 0.000000
    Q19 64   0    0  64 1.000000 0.000000 0.000000
    Q20 60   4    0  64 0.937500 0.062500 0.000000
    Q21 60   4    0  64 0.937500 0.062500 0.000000
    Q22 43  21    0  64 0.671875 0.328125 0.000000
    Q23 59   4    1  64 0.921875 0.062500 0.015625
    Q24 10  54    0  64 0.156250 0.843750 0.000000
    Q25 54   9    1  64 0.843750 0.140625 0.015625
    Q26 24  39    1  64 0.375000 0.609375 0.015625
    Q27  0   0   64  64 0.000000 0.000000 1.000000