Search code examples
rdplyrfrequencycumsumsummarize

add column total to new row in data frame R


Suppose I have the following data.

 A <- c(4,4,4,4)
 B <- c(1,2,3,4)
 C <- c(1,2,4,4)
 D <- c(3,2,4,1)

data <- as.data.frame(rbind(A,B,C,D))
data <- t(data)
data <- as.data.frame(data)

> data
     A B C D
  V1 4 1 1 3
  V2 4 2 2 2
  V3 4 3 4 4
  V4 4 4 4 1

I am looking to add 2 rows at the bottom. I tried rbind(data,colSums(data)) but it is giving me an error, and I'm having trouble finding something that will simply add a row.

The first row added needs to be the sum of the 1st 3 rows in each column. The second added row needs to be the sum of all 4 rows in each column.

so the output should look like this:

    > data
          A  B  C  D
       V1 4  1  1  3
       V2 4  2  2  2
       V3 4  3  4  4
       V4 4  4  4  1
V1:V3Sum 12  6  7  9
V1:V4Sum 16 10 11 10

If you want to take a stab, I'm then trying to get relative frequencies so adding another 5 rows.

1 row would be for each value in a column (V1:V4) divided by the v1:v5Sum value (4 rows). Then a 5th row would be the v1:v3Sum divided by the v1:v5Sum.

    > data
              A    B    C    D
           V1 4    1    1    3
           V2 4    2    2    2
           V3 4    3    4    4
           V4 4    4    4    1
    V1:V3Sum 12    6    7    9
    V1:V4Sum 16   10   11   10
relFreqV1   .25   .1   .09   .3     *each of these 4 rows is the value in
relFreqV2   .25   .2   .18   .2       row 1:4 divided by v1:v4Sum
relFreqV3   .25   .3   .36   .4 
relFreqV4   .25   .4   .36   .1
relFreqTot  .75  .6     .63   .9    * last row is v1:v3Sum divided by 
                                       V1:V4Sum

Any help is always appreciated!!!


Solution

  • You could accomplish this several ways, including some that are newer and more "tidy", but when the solution is straightforward in base R like this I prefer such an approach:

    rbind(data, colSums(data[1:3,]),colSums(data))
    
        A  B  C  D
    V1  4  1  1  3
    V2  4  2  2  2
    V3  4  3  4  4
    V4  4  4  4  1
    5  12  6  7  9
    6  16 10 11 10
    

    If you'd like the row names to match your desired output example then this is 1 option:

    data           <- rbind(data, colSums(data[1:3,]),colSums(data))
    rownames(data) <- c("V1", "V2", "V3", "V4", "V1:V3Sum", "V1:V4Sum")
    
              A  B  C  D
    V1        4  1  1  3
    V2        4  2  2  2
    V3        4  3  4  4
    V4        4  4  4  1
    V1:V3Sum 12  6  7  9
    V1:V4Sum 16 10 11 10
    

    RELATIVE FREQUENCIES

    You ask for a few more rows to reflect summary stats (relative frequncies). I believe this is what you wanted:

    rbind(data, 
          data[1,]/data[5,],
          data[2,]/data[5,],
          data[3,]/data[5,],
          data[4,]/data[5,],
          data[5,]/data[6,])
    
                       A          B          C          D
    V1         4.0000000  1.0000000  1.0000000  3.0000000
    V2         4.0000000  2.0000000  2.0000000  2.0000000
    V3         4.0000000  3.0000000  4.0000000  4.0000000
    V4         4.0000000  4.0000000  4.0000000  1.0000000
    V1:V3Sum  12.0000000  6.0000000  7.0000000  9.0000000
    V1:V4Sum  16.0000000 10.0000000 11.0000000 10.0000000
    V11        0.3333333  0.1666667  0.1428571  0.3333333
    V21        0.3333333  0.3333333  0.2857143  0.2222222
    V31        0.3333333  0.5000000  0.5714286  0.4444444
    V41        0.3333333  0.6666667  0.5714286  0.1111111
    V1:V3Sum1  0.7500000  0.6000000  0.6363636  0.9000000