I'm working with a time series dataset on levels of opposition in authoritarian regimes. I've included a sample of the data below. I would like to produce a table that displays the percentage of countries per year with a value of 1
for v2psoppaut
. Could someone tell me how to go about doing this? I'd like to produce a table that I can save as a new df
for plotting.
structure(list(year = 1900:1905, COWcode = c(70L, 70L, 70L, 70L,
70L, 70L), country_name = c("Mexico", "Mexico", "Mexico", "Mexico",
"Mexico", "Mexico"), country_text_id = c("MEX", "MEX", "MEX",
"MEX", "MEX", "MEX"), v2x_regime = c(0L, 0L, 0L, 0L, 0L, 0L),
v2psoppaut_ord = c(2L, 2L, 2L, 2L, 2L, 2L)), row.names = c(NA,
6L), class = "data.frame")
Trying using dplyr
from tidyverse
to group your data by year, then summarize it (aggregate) by taking the sum of rows where v2psoppaut_ord
is equal to 1 divided by the total number of rows within that group (e.g. year) with the n()
function. Save that to a new df for plotting. You will have two values: year and auth, with the latter indicating the proportion (multiply by 100 to get percentage) of countries with a value of 1 for the variable you indicated. Don't forget to ungroup the data with ungroup()
library(tidyverse)
plot_df <- df %>%
group_by(year) %>%
summarize(auth = sum(v2psoppaut_ord == 1, na.rm = T) / n()) %>%
ungroup()