Search code examples
rpercentagesummarization

How to calculate percentages for categorical variables by items?


I have a question about calculating the percentage by items and time bins. The experiment is like this:

I conduct an eye-tracking experiment. Participants were asked to describe pictures consisting of two areas of interest(AOIs; I name them Agent and Patient). Their eye movements (fixations on the two AOIs) were recorded along the time when they plan their formulation. I worked out a dataset included time information and AOIs as below (The whole time from the picture onset was divided into separate time bins, each time bin 40 ms).

Stimulus   Participant    AOIs         time_bin     
1          M1             agent          1               
1          M1             patient        2               
1          M1             patient        3               
1          M1             agent          4               

...
1          M2             agent          1               
1          M2             agent          2               
1          M2             agent          3              
1          M2             patient        4               
...
1          M3             agent          1               
1          M3             agent          2               
1          M3             agent          3              
1          M3             patient        4
...

2          M1             agent          1               
2          M1             agent          2               
2          M1             patient        3              
2          M1             patient        4

I would like to create a table containing the proportion of one AOI (e.g. agent) by each stimulus of each time bin. It would be like this:

Stimulus      time_bin      percentage     
1                1            20%              
1                2            40%               
1                3            55%               
1                4            60%    
...
2                1            30%              
2                2            35%               
2                3            40%               
2                4            45% 

I calculate the percentage because I want to do a multilevel analysis (Growth Curve Analysis) investigating the relationship between the dependent variable agent fixation proportion and the independent variable time_bin, as well as with the stimulus as a random effect.

I hope I get my question understood, due to my limited English knowledge.

If you have an idea or a suggestion, that would be a great help!


Solution

  • Using the tidyverse package ecosystem you could try:

    library(tidyverse)
    
    df %>%
      mutate(percentage = as.integer(AOIs == "agent") ) %>%
      group_by(Stimulus, time_bin) %>%
      summarise(percentage = mean(percentage))
    

    Note that this will give you ratios in the [0, 1] interval. You still have to convert it to the percentage values by multiplying with 100 and appending "%".