I have the following dataframe in R:
friendships <- structure(list(person = c("person3", "person10", "person2", "person6",
"person4", "person6", "person10", "person5", "person3", "person9",
"person9", "person9", "person3", "person8", "person10", "person7",
"person10"),
friend = c("person9", "person4", "person1", "person7",
"person10", "person7", "person9", "person10", "person7", "person5",
"person7", "person5", "person6", "person9", "person2", "person5",
"person8")),
row.names = c(1L, 3L, 4L, 5L, 7L, 8L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L),
class = "data.frame")
For person10, I want to find out the names and numbers of neighbors of degree=2.
In igraph, I can do it like this:
library(igraph)
g <- graph_from_data_frame(friendships, directed = FALSE)
#names of neighbors at degree=2
> ego(g, 2, "person10")
[[1]]
+ 9/10 vertices, named, from 6632700:
[1] person10 person2 person4 person5 person9 person8 person1 person7 person3
#number of neighbors at degree=2
> length(unlist(ego(g, 2, "person10")))
[1] 9
My Question: Now, I want to do this only using SQL code and R functions/loops (i.e. no igraph).
Here is my attempt to do this:
library(sqldf)
#first locate neighbors of person10
friends <- sqldf("SELECT friend FROM friendships WHERE person = 'person10'")
# nxt, initialize the list of friends with degree 2
friends_degree2 <- friends
From here, I tried to write a loop:
while (TRUE) {
new_friends <- sqldf(paste0("SELECT friend FROM friendships WHERE person IN ('", paste(friends$friend, collapse="','"), "')"))
new_friends <- unique(new_friends)
if (all(new_friends$friend %in% friends_degree2$friend)) {
break
}
friends_degree2 <- unique(rbind(friends_degree2, new_friends))
friends <- new_friends
}
From here, we can check the answers:
> print(nrow(friends_degree2))
[1] 8
>
> print(friends_degree2)
friend
1 person4
2 person9
3 person2
4 person8
11 person1
21 person10
31 person5
41 person7
However, person3 is not included in this list?
Can someone please show me how to fix this?
The code in the question using ego
is giving the degree 1 and 2 neighbors. If that is what you want then compute the degree 1 neighbors and using those find the degree 2 neighbors and then return both or if you just want the degree 2 neighbors then remove union select f from deg1
in the code below.
library(sqldf)
sqldf("with deg1 as (
select friend f from friendships where person = 'person10'
union
select person f from friendships where friend = 'person10'
)
select friend f from friendships where person in (select f from deg1)
union
select person f from friendships where friend in (select f from deg1)
union select f from deg1"
)
## f
## 1 person1
## 2 person10
## 3 person2
## 4 person3
## 5 person4
## 6 person5
## 7 person7
## 8 person8
## 9 person9
Here is a variation that factors out the double lookup into a new table sym
. If you want just degree 2 then remove the last line that begins with union
and replace the last select
with select distinct
.
sqldf("with
sym as (
select * from friendships
union
select friend as person, person as friend from friendships
),
deg1 as (
select friend f from sym where person = 'person10'
)
select friend f from sym where person in (select f from deg1)
union select f from deg1"
)
## f
## 1 person1
## 2 person10
## 3 person2
## 4 person3
## 5 person4
## 6 person5
## 7 person7
## 8 person8
## 9 person9