Search code examples
rnetwork-programmingnodesedges

Finding indirect nodes for every edge (in R)


I have information on groups of physicians working together in given hospitals. A physician can work in more than one hospital at the same time. I would like to write a code that outputs information of all indirect colleagues of a given physician working in a given hospital. For instance, if I work in a given hospital with another physician who also works in another hospital, I would like to know who are the physicians with whom my colleague works in this other hospital.

Consider a simple example of three hospitals (1, 2, 3) and five physicians (A, B, C, D, E). Physicians A, B and C work together in hospital 1. Physicians A, B and D work together in hospital 2. Physicians B and E work together in hospital 3.

For each physician working in a given hospital I would like information of their indirect colleagues through each of their direct colleagues. For example, physician A has one indirect colleague through physician B in hospital 1: this is physician E in hospital 3. On the other hand, physician B does not have any indirect colleague through physician A in hospital 1. Physician C has two indirect colleagues through physician B in hospital 1: they are physician D in hospital 2 and physician E in hospital 3. And so on..

Below is the object that describes the nertworks of physicians in all hospitals:

edges <- tibble(hosp  = c("1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "3", "3"), 
             from = c("A", "A", "B", "B", "C", "C", "A", "A", "B", "B", "D", "D", "B", "E"), 
             to   = c("C", "B", "C", "A", "B", "A", "D", "B", "A", "D", "A", "B", "E", "B")) %>% arrange(hosp, from, to)

I would like a code that produces the following output:

output <- tibble(hosp     = c("1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3"), 
             from     = c("A", "A", "B", "B", "C", "C", "C", "A", "A", "B", "B", "D", "D", "D", "B", "E", "E", "E", "E"), 
             to       = c("C", "B", "C", "A", "B", "A", "B", "D", "B", "A", "D", "A", "B", "B", "E", "B", "B", "B", "B"),
             hosp_ind = c("" , "3", "" , "" , "2", "2", "3", "" , "3", "" , "" , "1", "1", "3", "" , "1", "1", "2", "2"),
             to_ind   = c("" , "E", "" , "" , "D", "D", "E", "" , "E", "" , "" , "C", "C", "E", "" , "A", "C", "A", "D")) %>% arrange(hosp, from, to)

Solution

  • Here is one option using igraph + data.table

    library(igraph)
    library(data.table)
    
    g <- simplify(graph_from_data_frame(edges, directed = FALSE))
    res <- setDT(edges)[
      ,
      c(.SD, {
        to_ind <- setdiff(
          do.call(
            setdiff,
            Map(names, ego(g, 2, c(to, from), mindist = 2))
          ), from
        )
        if (!length(to_ind)) {
          hosp_ind <- to_ind <- NA_character_
        } else {
          hosp_ind <- lapply(to_ind, function(v) names(neighbors(g, v)))
        }
        data.table(
          hosp_ind = unlist(hosp_ind),
          to_ind = rep(to_ind, lengths(hosp_ind))
        )
      }),
      .(id = seq(nrow(edges)))
    ][, id := NULL][]
    

    and you will obtain

    > res
        hosp from to hosp_ind to_ind
     1:    1    A  B        3      E
     2:    1    A  C     <NA>   <NA>
     3:    1    B  A     <NA>   <NA>
     4:    1    B  C     <NA>   <NA>
     5:    1    C  A        2      D
     6:    1    C  B        2      D
     7:    1    C  B        3      E
     8:    2    A  B        3      E
     9:    2    A  D     <NA>   <NA>
    10:    2    B  A     <NA>   <NA>
    11:    2    B  D     <NA>   <NA>
    12:    2    D  A        1      C
    13:    2    D  B        1      C
    14:    2    D  B        3      E
    15:    3    B  E     <NA>   <NA>
    16:    3    E  B        1      A
    17:    3    E  B        2      A
    18:    3    E  B        1      C
    19:    3    E  B        2      D
    

    Also, when you run plot(g), you will see the graph like below enter image description here