Search code examples
sqlscalabilitydata-modelingrelationships

SQL Modeling Follower/Followed Relationships For Social Networking


I am building a social graph for my website. Users will create relationships (of the form follower/followed) where each party may independently follow the other. My users table looks like this:

Users table
 - UserId (PK, Auto-incrementing integer)

Thinking of how to model this, I've come up with several alternatives, such as:

(a) A table holds each 'follow' action as a separate row.

Relationships table
 - FollowerId (FK to Users.UserId)
 - FollowedId (FK to Users.UserId)

This has the drawback that given many users, it would create a huge number of rows.

(b) A table holds the list of users each user is following as a CSV or other structure:

Relationships table
 - FollowerId (FK to Users.UserId)
 - FollowingUsers (e.g. 2,488,28,40)

This has the drawback that queries will be much more complicated (and costly?). I'd also have to maintain ordering of the string values, etc...

(c) A relationship per row, where a user may be on either 'side' of the relationship:

Relationships table
 - Party1Id (FK to Users.UserId)
 - FollowingParty2 (boolean)
 - Party2Id (FK to Users.UserId)
 - FollowingParty1 (boolean)

This saves rows over (a), but queries are more complex because the user may be either party.

(d) Placing both 'following' and 'followed by' as lists like (b)

Relationships table
 - UserId (FK to Users.UserId)
 - FollowingUsers (e.g. 2,488,28,40)
 - FollowedBy (e.g. 2,488,28,40)

This seems like the best of all worlds, but now I have to use transactions to update multiple rows.

Assuming I am looking to scale to a large size, although aware that "Facebook's problems are not my problems" - which option, or which other option is preferred?


Solution

  • I'd go with option A.

    1. Any kind of social graph analysis will be impossible using other options
    2. Enforcing any kind of relational constraints will be impossible using other options
    3. There's no need to use relational database if you don't plan on storing data in a relational way.

    One interesting option might be to consider relationships table model:

    Relationships table

    • RelationshipId
    • UserId (FK to Users.UserId)
    • RelationType

    You can now connect users.

    case B follows A:

    • add RelationshipId1, UserAId, "IsFollowed"
    • add RelationshipId1, UserBId, "IsFollowing"

    case Another user starts following A:

    • add RelationshipId1, AnotherUserId, "IsFollowing"

    case Another user starts following B:

    • add RelationshipId2, AnotherUserId, "IsFollowing"

    You can even eliminate unneeded rows if you wish: A starts following B:

    • add RelationshipId3, UserAId, "IsFollowedAndIsFollowing"
    • add RelationshipId3, UserBId, "IsFollowedAndIsFollowing"
    • remove RelationshipId1, UserBId, "IsFollowing"