Finding out Neighbors of a Node using SQL

I have the following table in SQL:

CREATE TABLE friendships 
(
    id INT PRIMARY KEY,
    person VARCHAR(50),
    friend VARCHAR(50)
);

INSERT INTO friendships (id, person, friend) VALUES
(1, 'person3', 'person9'),
(3, 'person10', 'person4'),
(4, 'person2', 'person1'),
(5, 'person6', 'person7'),
(7, 'person4', 'person10'),
(8, 'person6', 'person7'),
(10, 'person10', 'person9'),
(11, 'person5', 'person10'),
(12, 'person3', 'person7'),
(13, 'person9', 'person5'),
(14, 'person9', 'person7'),
(15, 'person9', 'person5'),
(16, 'person3', 'person6'),
(17, 'person8', 'person9'),
(18, 'person10', 'person2'),
(19, 'person7', 'person5'),
(20, 'person10', 'person8');

Using R, I can plot the table the data like this:

library(igraph)

friendships = structure(list(person = c("person3", "person10", "person2", "person6", 
"person4", "person6", "person10", "person5", "person3", "person9", 
"person9", "person9", "person3", "person8", "person10", "person7", 
"person10"), friend = c("person9", "person4", "person1", "person7", 
"person10", "person7", "person9", "person10", "person7", "person5", 
"person7", "person5", "person6", "person9", "person2", "person5", 
"person8")), row.names = c(1L, 3L, 4L, 5L, 7L, 8L, 10L, 11L, 
12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L), class = "data.frame")


g <- graph_from_data_frame(df, directed = FALSE)
plot(g, vertex.size=10, vertex.label.cex=0.8)

ONLY using SQL commands, I want to answer the following questions (in the future, generalize to the n-th degree):

How many nodes are connected to Person10 by degree2 (i.e. friends of friends,, 8 people)
Who are these people that are connected to person10 by degree 2? (i.e. person2, person1, person4, person8, person9, person5, person7, person3)

I am not sure how to do these in SQL - I normally use igraph for these problems.

Here is my attempt to answer the first one:

SELECT COUNT(DISTINCT f2.friend)
FROM friendships f1
JOIN friendships f2 ON f1.friend = f2.person
WHERE f1.person = 'person10' AND f2.friend != 'person10' AND f2.friend NOT IN (
    SELECT friend FROM friendships WHERE person = 'person10'
);

#result
 3

Here is my attempt to answer the second one:

SELECT DISTINCT f2.friend
FROM friendships f1
JOIN friendships f2 ON f1.friend = f2.person
WHERE f1.person = 'person10' AND f2.friend != 'person10' AND f2.friend NOT IN (
    SELECT friend FROM friendships WHERE person = 'person10'
);

#result
1 person5
2 person7
3 person1

However these are not the correct results.

Can someone please show me how to do this correctly? Perhaps a loop/function can be written in R that executes the SQL until termination?

I am using an older version of SQL that doesn't have recursive functions.

Thanks!

Solution

Consider this recursive example.
There condition where person='person10' for select one person, otherwise - find level2 friends for all persons.
Condition lvl<2 - count friends for level 1 and level 2. You can set any possible level.
And path column only for clarity, if graph not have cycles.

with t0 as (
select person,friend from friendships
-- where person='person10'
  union
select friend,person from friendships
-- where friend='person10'
)
, r as(
select person,friend,1 lvl
  ,cast(concat(person,'-',friend) as varchar(1000))path
from t0
union all
select r.person,t.friend ,r.lvl+1 lvl
  ,cast(concat(path,'-',t.friend) as varchar(1000)) path
from r
inner join friendships t on (t.person=r.friend and t.friend<>r.person)
where lvl<2
union all
select r.person,t.person ,r.lvl+1 lvl
  ,cast(concat(path,'-',t.person)  as varchar(1000))path
from r
inner join friendships t on (t.friend=r.friend and t.person<>r.person)
where lvl<2
)
,allfriends as (
select person,friend ,count(*) qty
from r
group by person,friend
)
select person,count(*)friend_qty
from allfriends
group by person

Result for all person

person	friend_qty
person1	2
person10	8
person2	6
person3	6
person4	5
person5	8
person6	4
person7	6
person8	7
person9	8

For person='person10' cte r bypasses the graph along all paths.

person	friend	lvl	path
person10	person2	1	person10-person2
person10	person4	1	person10-person4
person10	person5	1	person10-person5
person10	person8	1	person10-person8
person10	person9	1	person10-person9
person10	person1	2	person10-person2-person1
person10	person3	2	person10-person9-person3
person10	person5	2	person10-person9-person5
person10	person5	2	person10-person9-person5
person10	person7	2	person10-person5-person7
person10	person7	2	person10-person9-person7
person10	person8	2	person10-person9-person8
person10	person9	2	person10-person8-person9
person10	person9	2	person10-person5-person9
person10	person9	2	person10-person5-person9

AllFriends for person10 (level up 2) and count of path for friend

person	friend	qty
person10	person1	1
person10	person2	1
person10	person3	1
person10	person4	1
person10	person5	3
person10	person7	2
person10	person8	2
person10	person9	4

There 2 path's (10-4),(4-10) counted as 1 path - in first cte we use union (not union all).
For pair (10,9) there 4 path's (10-9),(10-8-9),(10-5-9) and (10-5-9), on other levels we not exclude paths (union all). Possible left only unique path's by grouping.

Output for lvl=1

person	friend	qty
person10	person2	1
person10	person4	1
person10	person5	1
person10	person8	1
person10	person9	1

Demo here

Example for SQL server. Virtually unchanged (varchar(1000)->nchar(1000)) suitable for MySQL.

Unfortunately, I don't know how to do extended recursion in postgresql like (select .. union all select ... union all select ...)
We can do this in postgresql by making it more difficult join anchor and the table.

Upd1. If graph have cycles, we mast check - we have already passed this node - is there a node in the path? Such a check depends somewhat on the DBMS used (instr,patindex ...). In this case, the path is mandatory.