Search code examples
rdataframegroupingproportions

Is there an R function to get variable proportions based on another variable?


This is my data set summary:

               trajectory      context_face                     face    
 Min.   :  1   Pass     :404   Length:1204        Frown           :230  
 1st Qu.: 76   Undecided:400   Class :character   Scrunch         : 62  
 Median :151   Allow    :400   Mode  :character   Gasp            : 87  
 Mean   :151                                      Smile           :394  
 3rd Qu.:226                                      Neutral         :258  
 Max.   :301                                      Pout            : 25  
                                                  Wide-opened eyes:148 

I want to know the proportions (occurence) of specific faces (e.g., Frown, Scrunch) by each trajectory (Pass, Undecided, Allow). In other words, how many Frowns or Srunches were in each trajectory condition.

I tried function describeBy(df, df$trajectory) but it didn't work. Dataset is already in long form. I want to see something like this (values are random here):

                      Pass     Undecided      Allow
Frown                   10         2             18
Scrunch                 19         20            4
Gasp                    23         18            14
Smile                   19         11            6
Neutral
Pout
Wide-opened eyes

Which function should I use? Thank you very much. This is my first question I have no idea how to edit it, I hope you can still understand my point.


Solution

  • Here's a tidyverse approach

    library(tidyverse)
    
    df <- tibble(face = c('Frown','Scrunch','Gasp','Smile'),
                 Pass = c(10,19,23,19),
                 Undecided = c(2,20,18,11),
                 Allow = c(18,4,14,6))
    
    df %>%
      pivot_longer(cols = c(Pass, Undecided, Allow),
                   names_to = 'trajectory',
                   values_to = 'value') %>%
      group_by(face, trajectory) %>%
      summarize(occurence = sum(value)) %>%
      ungroup() %>%
      mutate(proportion = occurence/sum(occurence))
    #> `summarise()` has grouped output by 'face'. You can override using the
    #> `.groups` argument.
    #> # A tibble: 12 × 4
    #>    face    trajectory occurence proportion
    #>    <chr>   <chr>          <dbl>      <dbl>
    #>  1 Frown   Allow             18     0.110 
    #>  2 Frown   Pass              10     0.0610
    #>  3 Frown   Undecided          2     0.0122
    #>  4 Gasp    Allow             14     0.0854
    #>  5 Gasp    Pass              23     0.140 
    #>  6 Gasp    Undecided         18     0.110 
    #>  7 Scrunch Allow              4     0.0244
    #>  8 Scrunch Pass              19     0.116 
    #>  9 Scrunch Undecided         20     0.122 
    #> 10 Smile   Allow              6     0.0366
    #> 11 Smile   Pass              19     0.116 
    #> 12 Smile   Undecided         11     0.0671
    

    Created on 2023-03-16 with reprex v2.0.2