Search code examples
rfrequencyr-factor

Relative and cumulative percentages for levels of an ordered factor variable in R


The title says it all. I searched over the internet a lot but I was not able to find the answer.

This topic "Make Frequency Histogram for Factor Variables" does exactly what I need but for a plot not for a table. I have an ordered factor variable and I need to calculate the relative percentages and the cumulative percentages for each level as if it was a numerical value. I would like to calculate the percentagies and save them in a separate table. Any suggestions? Thank you in advance.


Solution

  • Is this what you mean:

    X <- sample(LETTERS[1:5],1000,replace=T)
    X <- factor(X, ordered=T)
    prop.table(table(X))
    # X
    #     A     B     C     D     E 
    # 0.210 0.187 0.180 0.222 0.201
    
    cumsum(prop.table(table(X)))
    #     A     B     C     D     E 
    # 0.210 0.397 0.577 0.799 1.000
    

    This is basically just @Roland's answer from the question you referenced??

    EDIT (Response to OP's comment)

    Y <- table(X)
    str(Y)
    #  'table' int [1:5(1d)] 205 191 200 183 221
    #  - attr(*, "dimnames")=List of 1
    #   ..$ X: chr [1:5] "A" "B" "C" "D" ...
    Z <- c(table(X))
    str(Z)
    #  Named int [1:5] 205 191 200 183 221
    #  - attr(*, "names")= chr [1:5] "A" "B" "C" "D" ...
    

    So Y is of class "table", whereas Z is a named integer vector. The main difference is the way various R functions treat the different classes. Try plot(Y) and plot(Z) or data.frame(Y) and data.frame(Z). Note, however, that (e.g.) sum(Y) and sum(Z) return the same thing.