I would like to solve the following problem using the dplyr
in R. This question has been answered using data.table
here: Finding indirect nodes for every edge (in R) but because the remainder of my code uses dplyr I need to adapt it.
I have information on groups of physicians working together in given hospitals. A physician can work in more than one hospital at the same time. I would like to write a code that outputs information of all indirect colleagues of a given physician working in a given hospital. For instance, if I work in a given hospital with another physician who also works in another hospital, I would like to know who are the physicians with whom my colleague works in this other hospital.
Consider a simple example of three hospitals (1, 2, 3) and five physicians (A, B, C, D, E). Physicians A, B and C work together in hospital 1. Physicians A, B and D work together in hospital 2. Physicians B and E work together in hospital 3.
For each physician working in a given hospital I would like information of their indirect colleagues through each of their direct colleagues. For example, physician A has one indirect colleague through physician B in hospital 1: this is physician E in hospital 3. On the other hand, physician B does not have any indirect colleague through physician A in hospital 1. Physician C has two indirect colleagues through physician B in hospital 1: they are physician D in hospital 2 and physician E in hospital 3. And so on..
Below is the object that describes the nertworks of physicians in all hospitals:
edges <- tibble(hosp = c("1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "3", "3"),
from = c("A", "A", "B", "B", "C", "C", "A", "A", "B", "B", "D", "D", "B", "E"),
to = c("C", "B", "C", "A", "B", "A", "D", "B", "A", "D", "A", "B", "E", "B")) %>% arrange(hosp, from, to)
I would like a code that produces the following output:
output <- tibble(hosp = c("1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3"),
from = c("A", "A", "B", "B", "C", "C", "C", "A", "A", "B", "B", "D", "D", "D", "B", "E", "E", "E", "E"),
to = c("C", "B", "C", "A", "B", "A", "B", "D", "B", "A", "D", "A", "B", "B", "E", "B", "B", "B", "B"),
hosp_ind = c("" , "3", "" , "" , "2", "2", "3", "" , "3", "" , "" , "1", "1", "3", "" , "1", "1", "2", "2"),
to_ind = c("" , "E", "" , "" , "D", "D", "E", "" , "E", "" , "" , "C", "C", "E", "" , "A", "C", "A", "D")) %>% arrange(hosp, from, to)
Actually you can translate the data.table
into dplyr
in the following manner
g <- simplify(graph_from_data_frame(edges, directed = FALSE))
edges %>%
rowwise() %>%
do(cbind(., {
to_ind <- setdiff(
do.call(
setdiff,
Map(names, ego(g, 2, c(.$to, .$from), mindist = 2))
), .$from
)
if (!length(to_ind)) {
hosp_ind <- to_ind <- NA_character_
} else {
hosp_ind <- lapply(to_ind, function(v) names(neighbors(g, v)))
}
data.frame(
hosp_ind = unlist(hosp_ind),
to_ind = rep(to_ind, lengths(hosp_ind))
)
}))
which gives you
# A tibble: 19 x 5
hosp from to hosp_ind to_ind
<chr> <chr> <chr> <chr> <chr>
1 1 A B 3 E
2 1 A C NA NA
3 1 B A NA NA
4 1 B C NA NA
5 1 C A 2 D
6 1 C B 2 D
7 1 C B 3 E
8 2 A B 3 E
9 2 A D NA NA
10 2 B A NA NA
11 2 B D NA NA
12 2 D A 1 C
13 2 D B 1 C
14 2 D B 3 E
15 3 B E NA NA
16 3 E B 1 A
17 3 E B 2 A
18 3 E B 1 C
19 3 E B 2 D