I have the following table in SQL:
CREATE TABLE friendships
(
id INT PRIMARY KEY,
person VARCHAR(50),
friend VARCHAR(50)
);
INSERT INTO friendships (id, person, friend) VALUES
(1, 'person3', 'person9'),
(3, 'person10', 'person4'),
(4, 'person2', 'person1'),
(5, 'person6', 'person7'),
(7, 'person4', 'person10'),
(8, 'person6', 'person7'),
(10, 'person10', 'person9'),
(11, 'person5', 'person10'),
(12, 'person3', 'person7'),
(13, 'person9', 'person5'),
(14, 'person9', 'person7'),
(15, 'person9', 'person5'),
(16, 'person3', 'person6'),
(17, 'person8', 'person9'),
(18, 'person10', 'person2'),
(19, 'person7', 'person5'),
(20, 'person10', 'person8');
Using R, I can plot the table the data like this:
library(igraph)
friendships = structure(list(person = c("person3", "person10", "person2", "person6",
"person4", "person6", "person10", "person5", "person3", "person9",
"person9", "person9", "person3", "person8", "person10", "person7",
"person10"), friend = c("person9", "person4", "person1", "person7",
"person10", "person7", "person9", "person10", "person7", "person5",
"person7", "person5", "person6", "person9", "person2", "person5",
"person8")), row.names = c(1L, 3L, 4L, 5L, 7L, 8L, 10L, 11L,
12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L), class = "data.frame")
g <- graph_from_data_frame(df, directed = FALSE)
plot(g, vertex.size=10, vertex.label.cex=0.8)
ONLY using SQL commands, I want to answer the following questions (in the future, generalize to the n-th degree):
I am not sure how to do these in SQL - I normally use igraph for these problems.
Here is my attempt to answer the first one:
SELECT COUNT(DISTINCT f2.friend)
FROM friendships f1
JOIN friendships f2 ON f1.friend = f2.person
WHERE f1.person = 'person10' AND f2.friend != 'person10' AND f2.friend NOT IN (
SELECT friend FROM friendships WHERE person = 'person10'
);
#result
3
Here is my attempt to answer the second one:
SELECT DISTINCT f2.friend
FROM friendships f1
JOIN friendships f2 ON f1.friend = f2.person
WHERE f1.person = 'person10' AND f2.friend != 'person10' AND f2.friend NOT IN (
SELECT friend FROM friendships WHERE person = 'person10'
);
#result
1 person5
2 person7
3 person1
However these are not the correct results.
Can someone please show me how to do this correctly? Perhaps a loop/function can be written in R that executes the SQL until termination?
I am using an older version of SQL that doesn't have recursive functions.
Thanks!
Consider this recursive example.
There condition where person='person10'
for select one person, otherwise - find level2 friends for all persons.
Condition lvl<2
- count friends for level 1 and level 2. You can set any possible level.
And path
column only for clarity, if graph not have cycles.
with t0 as (
select person,friend from friendships
-- where person='person10'
union
select friend,person from friendships
-- where friend='person10'
)
, r as(
select person,friend,1 lvl
,cast(concat(person,'-',friend) as varchar(1000))path
from t0
union all
select r.person,t.friend ,r.lvl+1 lvl
,cast(concat(path,'-',t.friend) as varchar(1000)) path
from r
inner join friendships t on (t.person=r.friend and t.friend<>r.person)
where lvl<2
union all
select r.person,t.person ,r.lvl+1 lvl
,cast(concat(path,'-',t.person) as varchar(1000))path
from r
inner join friendships t on (t.friend=r.friend and t.person<>r.person)
where lvl<2
)
,allfriends as (
select person,friend ,count(*) qty
from r
group by person,friend
)
select person,count(*)friend_qty
from allfriends
group by person
Result for all person
person | friend_qty |
---|---|
person1 | 2 |
person10 | 8 |
person2 | 6 |
person3 | 6 |
person4 | 5 |
person5 | 8 |
person6 | 4 |
person7 | 6 |
person8 | 7 |
person9 | 8 |
For person='person10' cte r
bypasses the graph along all paths.
person | friend | lvl | path |
---|---|---|---|
person10 | person2 | 1 | person10-person2 |
person10 | person4 | 1 | person10-person4 |
person10 | person5 | 1 | person10-person5 |
person10 | person8 | 1 | person10-person8 |
person10 | person9 | 1 | person10-person9 |
person10 | person1 | 2 | person10-person2-person1 |
person10 | person3 | 2 | person10-person9-person3 |
person10 | person5 | 2 | person10-person9-person5 |
person10 | person5 | 2 | person10-person9-person5 |
person10 | person7 | 2 | person10-person5-person7 |
person10 | person7 | 2 | person10-person9-person7 |
person10 | person8 | 2 | person10-person9-person8 |
person10 | person9 | 2 | person10-person8-person9 |
person10 | person9 | 2 | person10-person5-person9 |
person10 | person9 | 2 | person10-person5-person9 |
AllFriends
for person10
(level up 2) and count of path for friend
person | friend | qty |
---|---|---|
person10 | person1 | 1 |
person10 | person2 | 1 |
person10 | person3 | 1 |
person10 | person4 | 1 |
person10 | person5 | 3 |
person10 | person7 | 2 |
person10 | person8 | 2 |
person10 | person9 | 4 |
There 2 path's (10-4),(4-10) counted as 1 path - in first cte we use union
(not union all
).
For pair (10,9) there 4 path's (10-9),(10-8-9),(10-5-9) and (10-5-9), on other levels we not exclude paths (union all
). Possible left only unique path's by grouping.
Output for lvl=1
person | friend | qty |
---|---|---|
person10 | person2 | 1 |
person10 | person4 | 1 |
person10 | person5 | 1 |
person10 | person8 | 1 |
person10 | person9 | 1 |
Example for SQL server. Virtually unchanged (varchar(1000)->nchar(1000)) suitable for MySQL.
Unfortunately, I don't know how to do extended recursion in postgresql like (select .. union all select ... union all select ...)
We can do this in postgresql by making it more difficult join anchor and the table.
Upd1. If graph have cycles, we mast check - we have already passed this node - is there a node in the path
? Such a check depends somewhat on the DBMS used (instr,patindex ...). In this case, the path is mandatory.