Search code examples
rfrequency

calculate the frequency of two actors voting the same thing


I'm trying to calculate how often, on average, Germany agreed with the US on a vote in the UN general assembly since 1990. For this, I'm using the unvotes package (hosted both on CRAN and GitHub) provides data on the voting history of countries in the [United Nations General Assembly]. I'm focusing on the un_votes and un_roll_calls datasets, which I merged.

So far I have this:

##count how often Germany and the US agreed on a resolution in each year:
countries <- c("United States", "Germany")

by_country_year <- merged %>%
  group_by(year = year(date), country, unres, rcid, vote) %>%
    filter(country %in% countries, year >= 1990)

but I am completely lost as to how I can go ahead. Any leads?


Solution

  • You can filter according to dates and country, then group by rcid, before summarizing to create two separate columns for the US and German votes.

    library(tidyverse)
    library(unvotes)
    
    merged <- un_votes %>% inner_join(un_roll_calls, by = "rcid") 
    
    result <- merged %>% 
      filter(date >= as.Date('1990-01-01')) %>%
      filter(country %in% c('United States', 'Germany')) %>%
      group_by(rcid) %>%
      summarize(US_vote = vote[country == 'United States'],
                Germany_vote = vote[country == 'Germany'])
    

    This now allows a table of all votes and how they compare between the two countries.

    table(US = result$US_vote, Germany = result$Germany_vote)
    #>          Germany
    #> US        yes abstain  no
    #>   yes     629      15   4
    #>   abstain 262      89   0
    #>   no      719     399 423
    

    We can also see whether the proportion of agreement is what we might expect by chance. Let's first drop the abstentions and then use a prop.test

    result <- result %>% filter(US_vote != 'abstain' & Germany_vote != 'abstain')
    
    prop.test(sum(result$US_vote == result$Germany_vote), nrow(result))
    #> 
    #>  1-sample proportions test with continuity correction
    #> 
    #> data:  sum(result$US_vote == result$Germany_vote) out of nrow(result), null probability 0.5
    #> X-squared = 60.611, df = 1, p-value = 6.955e-15
    #> alternative hypothesis: true p is not equal to 0.5
    #> 95 percent confidence interval:
    #>  0.5693588 0.6155882
    #> sample estimates:
    #>         p 
    #> 0.5926761 
    

    This means that on votes where neither abstained, Germany and the US were more likely to vote the same way than would be expected by chance. I find this reassuring.

    Created on 2022-10-04 with reprex v2.0.2