Search code examples
rdplyrgroup

How to create a table in R that displays the percentage of observations per year equal to a certain value?


I'm working with a time series dataset on levels of opposition in authoritarian regimes. I've included a sample of the data below. I would like to produce a table that displays the percentage of countries per year with a value of 1 for v2psoppaut. Could someone tell me how to go about doing this? I'd like to produce a table that I can save as a new df for plotting.

structure(list(year = 1900:1905, COWcode = c(70L, 70L, 70L, 70L, 
70L, 70L), country_name = c("Mexico", "Mexico", "Mexico", "Mexico", 
"Mexico", "Mexico"), country_text_id = c("MEX", "MEX", "MEX", 
"MEX", "MEX", "MEX"), v2x_regime = c(0L, 0L, 0L, 0L, 0L, 0L), 
    v2psoppaut_ord = c(2L, 2L, 2L, 2L, 2L, 2L)), row.names = c(NA, 
6L), class = "data.frame")

Solution

  • Trying using dplyr from tidyverse to group your data by year, then summarize it (aggregate) by taking the sum of rows where v2psoppaut_ord is equal to 1 divided by the total number of rows within that group (e.g. year) with the n() function. Save that to a new df for plotting. You will have two values: year and auth, with the latter indicating the proportion (multiply by 100 to get percentage) of countries with a value of 1 for the variable you indicated. Don't forget to ungroup the data with ungroup()

    library(tidyverse)
    
    plot_df <- df %>%
      group_by(year) %>%
      summarize(auth = sum(v2psoppaut_ord == 1, na.rm = T) / n()) %>%
      ungroup()