Search code examples
sqlsql-serverselectsql-server-2017dataexplorer

SQL Find Reciprocal Relationship


I am trying to find a situation using the Stack Exchange Data Explorer (SEDE) where two distinct users on Stack Overflow have accepted an answer from each other. So for example:

Post A { Id: 1, OwnerUserId: "user1", AcceptedAnswerId: "user2" }

and

Post B { Id: 2, OwnerUserId: "user2", AcceptedAnswerId: "user1" }

I currently have a query that can find two users that have collaborated on more than question as questioner-answerer but it does not determine if that relationship is reciprocal:

SELECT user1.Id AS User_1, user2.Id AS User_2
FROM Posts p
INNER JOIN Users user1 ON p.OwnerUserId = user1.Id
INNER JOIN Posts p2 ON p.AcceptedAnswerId = p2.Id
INNER JOIN Users user2 ON p2.OwnerUserId = user2.Id
WHERE p.OwnerUserId <> p2.OwnerUserId
AND p.OwnerUserId IS NOT NULL
AND p2.OwnerUserId IS NOT NULL
AND user1.Id <> user2.Id
GROUP BY user1.Id, user2.Id HAVING COUNT(*) > 1;

For anyone unfamiliar with the schema, there are two tables like so:

Posts
--------------------------------------
Id                      int
PostTypeId              tinyint
AcceptedAnswerId        int
ParentId                int
CreationDate            datetime
DeletionDate            datetime
Score                   int
ViewCount               int
Body                    nvarchar (max)
OwnerUserId             int
OwnerDisplayName        nvarchar (40)
LastEditorUserId        int
LastEditorDisplayName   nvarchar (40)
LastEditDate            datetime
LastActivityDate        datetime
Title                   nvarchar (250)
Tags                    nvarchar (250)
AnswerCount             int
CommentCount            int
FavoriteCount           int
ClosedDate              datetime
CommunityOwnedDate      datetime

And

Users
--------------------------------------
Id                      int
Reputation              int
CreationDate            datetime
DisplayName             nvarchar (40)
LastAccessDate          datetime
WebsiteUrl              nvarchar (200)
Location                nvarchar (100)
AboutMe                 nvarchar (max)
Views                   int
UpVotes                 int
DownVotes               int
ProfileImageUrl         nvarchar (200)
EmailHash               varchar (32)
AccountId               int

Solution

  • One CTE and simple inner joins will do the job. There's no need for so much code as I've observed in other answers. Note a lot of comments in mine.

    Link to StackExchange Data Explorer with sample result saved

    with questions as ( -- this is needed so that we have ids of users asking and answering
    select
       p1.owneruserid as question_userid
     , p2.owneruserid as answer_userid
     --, p1.id -- to view sample ids
    from posts p1
    inner join posts p2 on -- to fetch answer post
      p1.acceptedanswerid = p2.id
    )
    select distinct -- unique pairs
        q1.question_userid as userid1
      , q1.answer_userid as userid2
      --, q1.id, q2.id -- to view sample ids
    from questions q1
    inner join questions q2 on
          q1.question_userid = q2.answer_userid -- accepted answer from someone
      and q1.answer_userid = q2.question_userid -- who also accepted our answer
      and q1.question_userid <> q1.answer_userid -- and we aren't self-accepting
    

    This brings as an example posts:

    Though, StackExchange may throw you timeout because of large dataset and the distinct part. If you wish to view some data, remove distinct and add top N at start:

    with questions as (
    ...
    )
    select top 3 ...