Search code examples
rsummary

Summarizing count data as proportion in a data.frame


dummy <- data.frame(Q1 = c(0, 1, 0, 1),
                    Q2 = c(1, 1, 0, 1),
                    Q3 = c(0, 1, 1, 0))
df_dummy <- data.frame(Question = c("Q1", "Q2", "Q3"),
                       X1 = c(2/4, 3/4, 2/4),
                       X0 = c(2/4, 1/4, 2/4))

> dummy
  Q1 Q2 Q3
1  0  1  0
2  1  1  1
3  0  0  1
4  1  1  0

> df_dummy
  Question   X1   X0
1       Q1 0.50 0.50
2       Q2 0.75 0.25
3       Q3 0.50 0.50

I have some data (dummy) where I have binary responses to Q1, Q2, and Q3. I want to summarize my data in the format as shown in df_dummy, where for each question, column X1 tells me the proportion of people that answered 1 to Q1, and column X0 tells me the proportion of people that answered 0 to Q0. I tried prop.table but that didn't return the desired result.


Solution

  • Another way is counting the proportion of 1s and then deducing from that the proportion of 0s:

    X1 <- colSums(dummy==1)/nrow(dummy)
    df_dummy <- data.frame(X1, X0=1-X1)
    df_dummy
    #     X1   X0
    #Q1 0.50 0.50
    #Q2 0.75 0.25
    #Q3 0.50 0.50
    

    NB, inspired from @akrun's idea of ColMeans: You can also use colMeans instead of dividing colSumsby the number of row to define X1:

    X1 <- colMeans(dummy==1)
    df_dummy <- data.frame(X1, X0=1-X1)
    df_dummy
    #     X1   X0
    #Q1 0.50 0.50
    #Q2 0.75 0.25
    #Q3 0.50 0.50