Search code examples
sqlrigraph

Finding out Neighbors using SQL/R Loops


I have the following dataframe in R:

friendships <- structure(list(person = c("person3", "person10", "person2", "person6", 
                                         "person4", "person6", "person10", "person5", "person3", "person9", 
                                         "person9", "person9", "person3", "person8", "person10", "person7", 
                                         "person10"), 
                              friend = c("person9", "person4", "person1", "person7", 
                                         "person10", "person7", "person9", "person10", "person7", "person5", 
                                         "person7", "person5", "person6", "person9", "person2", "person5", 
                                         "person8")), 
                         row.names = c(1L, 3L, 4L, 5L, 7L, 8L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L), 
                         class = "data.frame")

enter image description here

For person10, I want to find out the names and numbers of neighbors of degree=2.

In igraph, I can do it like this:

library(igraph)

g <- graph_from_data_frame(friendships, directed = FALSE)

#names of neighbors at degree=2
> ego(g, 2, "person10")

[[1]]
+ 9/10 vertices, named, from 6632700:
[1] person10 person2  person4  person5  person9  person8  person1  person7  person3 

#number of neighbors at degree=2

 > length(unlist(ego(g, 2, "person10")))
[1] 9

My Question: Now, I want to do this only using SQL code and R functions/loops (i.e. no igraph).

Here is my attempt to do this:

library(sqldf)

#first locate neighbors of person10
friends <- sqldf("SELECT friend FROM friendships WHERE person = 'person10'")

# nxt, initialize the list of friends with degree 2
friends_degree2 <- friends

From here, I tried to write a loop:

    while (TRUE) {
        new_friends <- sqldf(paste0("SELECT friend FROM friendships WHERE person IN ('", paste(friends$friend, collapse="','"), "')"))
        
        new_friends <- unique(new_friends)
        
        if (all(new_friends$friend %in% friends_degree2$friend)) {
            break
        }
      
        friends_degree2 <- unique(rbind(friends_degree2, new_friends))
        
        friends <- new_friends
    }

From here, we can check the answers:

    > print(nrow(friends_degree2))
    [1] 8
    > 
    > print(friends_degree2)
         friend
    1   person4
    2   person9
    3   person2
    4   person8
    11  person1
    21 person10
    31  person5
    41  person7

However, person3 is not included in this list?

Can someone please show me how to fix this?


Solution

  • The code in the question using ego is giving the degree 1 and 2 neighbors. If that is what you want then compute the degree 1 neighbors and using those find the degree 2 neighbors and then return both or if you just want the degree 2 neighbors then remove union select f from deg1 in the code below.

    library(sqldf)
    sqldf("with deg1 as (
       select friend f from friendships where person = 'person10'
       union
       select person f from friendships where friend = 'person10'
      )
      select friend f from friendships where person in (select f from deg1)
      union 
      select person f from friendships where friend in (select f from deg1)
      union select f from deg1"
    )
    ##          f
    ## 1  person1
    ## 2 person10
    ## 3  person2
    ## 4  person3
    ## 5  person4
    ## 6  person5
    ## 7  person7
    ## 8  person8
    ## 9  person9
    

    Here is a variation that factors out the double lookup into a new table sym. If you want just degree 2 then remove the last line that begins with union and replace the last select with select distinct.

    sqldf("with
      sym as (
        select * from friendships
        union
        select friend as person, person as friend from friendships
      ),
      deg1 as (
        select friend f from sym where person = 'person10'
      )
      select friend f from  sym where person in (select f from deg1)
      union select f from deg1"
    )
    ##          f
    ## 1  person1
    ## 2 person10
    ## 3  person2
    ## 4  person3
    ## 5  person4
    ## 6  person5
    ## 7  person7
    ## 8  person8
    ## 9  person9