Search code examples
rdataframetabularsummary

Calculate the levels of mutiple variables and return tabular result


I would like to put the output from a summary command into a data table. For example, with this data frame:

   Person     V1     V2     V3     V4
1       A medium medium medium   high
2       B medium medium    low    low
3       V   high   high medium medium
4       D medium medium    low   high
5       E   high   high medium    low
6       F medium medium    low    low
7       G   high   high    low   high
8       H medium    low medium    low
9       I medium medium    low medium
10      J medium    low medium    low

x.df<-structure(list(Person = structure(c(1L, 2L, 10L, 3L, 4L, 5L, 
6L, 7L, 8L, 9L), .Label = c("A", "B", "D", "E", "F", "G", "H", 
"I", "J", "V"), class = "factor"), V1 = structure(c(2L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 2L, 2L), .Label = c("high", "medium"), class = "factor"), 
V2 = structure(c(3L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 3L, 2L), .Label = c("high", 
"low", "medium"), class = "factor"), V3 = structure(c(2L, 
1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L), .Label = c("low", "medium"
), class = "factor"), V4 = structure(c(1L, 2L, 3L, 1L, 2L, 
2L, 1L, 2L, 3L, 2L), .Label = c("high", "low", "medium"), class = "factor")), .Names = c("Person", 
"V1", "V2", "V3", "V4"), class = "data.frame", row.names = c(NA, 
-10L))

with summary(x.df) I get the counts for each factor level:

     Person       V1         V2         V3         V4   
 A      :1   high  :3   high  :3   low   :5   high  :3  
 B      :1   medium:7   low   :2   medium:5   low   :5  
 D      :1              medium:5              medium:2  
 E      :1                                              
 F      :1                                              
 G      :1                                              
 (Other):4                                              

Ideally, I would like a data frame of the counts for each factor level, ie:

  Var low medium high
1  V1   0      7    3
2  V2   2      5    3
3  V3   5      5    0
4  V4   5      2    3

with row sums equal to the 10.


Solution

  • Here is a method of getting counts of each question variable into a matrix.

    myMat <- sapply(x.df[-1],
                    function(x) table(factor(x, levels=c("low", "medium", "high"))))
    

    The idea is to use sapply to run through each of these variables, convert the variable to a factor with the desired levels, and then call table on the converted variable.

    This returns

    myMat
           V1 V2 V3 V4
    low     0  2  5  5
    medium  7  5  5  2
    high    3  3  0  3
    

    If you want to convert it to your desired output, just use t to transpose it:

    t(myMat)
       low medium high
    V1   0      7    3
    V2   2      5    3
    V3   5      5    0
    V4   5      2    3