Search code examples
sqlalgorithmsocial-networkingsimilarity

How to count common neighbors of two users and calculate similarity?


The monthly_connections table contains columns calling_party, called_party, common_neighbors, neighborhood_overlap

So the table describes which users are connected. One of the measures for user similarity is neighborhood overlap which is defined as following:

neighborhood_overlap = (number of nodes who are neighbors of both calling_party and called_party)/(number of nodes who are neighbors of at least one of calling_party or called_party)

Trying to calculate the number of common neighbors for two users I wrote the following query:

SELECT 
COUNT (*) FROM 
(SELECT t1.neighborA 
     FROM (
          SELECT called_party AS neighborA FROM monthly_connections 
          WHERE calling_party = '9F7334BCF9000CD68D40302DC4801E60C027A7D1' 
          UNION SELECT calling_party AS neighborA FROM monthly_connections
                WHERE called_party = '9F7334BCF9000CD68D40302DC4801E60C027A7D1') t1                  
          INNER JOIN (SELECT called_party AS neighborB FROM monthly_connections 
                      WHERE calling_party = '10D149A4356E1AA3A8AF604BD992BBA141DB53D2'
                      UNION SELECT calling_party AS neighborB FROM monthly_connections
                            WHERE called_party = '10D149A4356E1AA3A8AF604BD992BBA141DB53D2') t2 ON t1.neighborA = t2.neighborB) t3

The query above calculates the number of common neighbors of users 10D149A4356E1AA3A8AF604BD992BBA141DB53D2 and 9F7334BCF9000CD68D40302DC4801E60C027A7D1

The goal is to write the query to set the value of column common neighbors and neighborhood overlap for each pair of connection in the table

Does anyone know how to write the query to update columns common_neighbors and neighborhood_overlap?

For common neighbors I started to write the following query but it is not correct:

 UPDATE mc SET
    common_neighbors = 
    (SELECT COUNT (*) FROM 
(SELECT t1.neighborA FROM (SELECT called_party AS neighborA FROM monthly_connections WHERE calling_party = mc.calling_party UNION SELECT calling_party AS neighborA FROM monthly_connections WHERE called_party = mc.calling_party) t1 INNER JOIN (SELECT called_party AS neighborB FROM monthly_connections WHERE calling_party = mc.called_party UNION SELECT calling_party AS neighborB FROM monthly_connections WHERE called_party = mc.called_party) t2 ON t1.neighborA = t2.neighborB) t3) FROM monthly_connections mc INNER JOIN t3 ON t3.calling_party = mc.calling_party AND t3.called_party = mc.called_party

Solution

  • I think this query works (though might not be that performant).

    UPDATE mc 
       SET common_neighbors = (SELECT COUNT (*) FROM
         (
          (SELECT called_party FROM monthly_connections 
            WHERE calling_party = mc.calling_party
             UNION
           SELECT calling_party FROM monthly_connections 
            WHERE called_party = mc.calling_party
          )
            INTERSECT
          (SELECT calling_party FROM monthly_connections 
           WHERE called_party = mc.called_party
             UNION
           SELECT called_party FROM monthly_connections 
            WHERE calling_party = mc.called_party
          )
         ) t1
       ) FROM monthly_connections mc