Search code examples
rcombn

How do use combn generally when pairs are not always possible?


I am looking for a generic method to deal with situations in which combinations are required, but when data does not always meet the assumptions of the combn function.

Specifically, I have a dataframe of members of Congress and their committee assignments. To examine this network of politicians, I want to associate (that is, create links between) any members who belong to the same committees.

The data look like this:

name_id     assignment
 A000374    Agriculture
 A000370    Agriculture
 A000055 Appropriations
 A000371 Appropriations
 A000372    Agriculture
 A000376        Foreign

So, the resulting network data should look like this:

from       to          committee
A000374    A000370     Agriculture
A000055    A000371     Appropriations

The problem is that my code (below) throws an error because there are not always pairings ( ncombn command in code that identifies such cases. Is that the right approach, and if so, how does one create a command that accounts for this problem generally?

Here is my code, currently:

library(RCurl)
x <- getURL("https://raw.githubusercontent.com/bac3917/Cauldron/master/cstack.csv")
cstack <- read.csv(text = x)

# split the string into two columns that represent name_id and committee assignment
cstack <- cstack %>% separate(namePaste, c("name_id","assignment")) 

# use combn and dplyr to create pairs (results in error)
edges<-cstack %>% 
  group_by(assignment) %>%
  do(as.data.frame(t(combn(.[["name_id"]], 2)))) %>%
  group_by(V1, V2) %>% 
  summarise(n( ))

Solution

  • As Ben mentioned, combn(x, 2) does not work for x < 2. You could define a function that calculated combn only when x > 1. Below is a data.table version.

    library(data.table)
    cstack <- fread("https://raw.githubusercontent.com/bac3917/Cauldron/master/cstack.csv",
        header=TRUE)[, tstrsplit(sub(" ", "\01", namePaste), "\01")]
    setnames(cstack, c("name_id","assignment"))
    mycomb <- function(x) if(length(x) > 1) data.table(t(combn(x, 2)))
    cstack <- cstack[, mycomb(name_id), by = "assignment"]
    setcolorder(cstack, c(2,3,1))
    setnames(cstack, c("V1", "V2"), c("from", "to"))
    cstack
    #>           from      to      assignment
    #>     1: A000374 A000370     Agriculture
    #>     2: A000374 A000372     Agriculture
    #>     3: A000374 A000378     Agriculture
    #>     4: A000374 B001298     Agriculture
    #>     5: A000374 B001307     Agriculture
    #>    ---                                
    #> 12957: C001053 L000491  Ranking Member
    #> 12958: C001053 R000582  Ranking Member
    #> 12959: D000619 L000491  Ranking Member
    #> 12960: D000619 R000582  Ranking Member
    #> 12961: L000491 R000582  Ranking Member