I am building a social graph for my website. Users will create relationships (of the form follower/followed) where each party may independently follow the other. My users table looks like this:
Users table
- UserId (PK, Auto-incrementing integer)
Thinking of how to model this, I've come up with several alternatives, such as:
(a) A table holds each 'follow' action as a separate row.
Relationships table
- FollowerId (FK to Users.UserId)
- FollowedId (FK to Users.UserId)
This has the drawback that given many users, it would create a huge number of rows.
(b) A table holds the list of users each user is following as a CSV or other structure:
Relationships table
- FollowerId (FK to Users.UserId)
- FollowingUsers (e.g. 2,488,28,40)
This has the drawback that queries will be much more complicated (and costly?). I'd also have to maintain ordering of the string values, etc...
(c) A relationship per row, where a user may be on either 'side' of the relationship:
Relationships table
- Party1Id (FK to Users.UserId)
- FollowingParty2 (boolean)
- Party2Id (FK to Users.UserId)
- FollowingParty1 (boolean)
This saves rows over (a), but queries are more complex because the user may be either party.
(d) Placing both 'following' and 'followed by' as lists like (b)
Relationships table
- UserId (FK to Users.UserId)
- FollowingUsers (e.g. 2,488,28,40)
- FollowedBy (e.g. 2,488,28,40)
This seems like the best of all worlds, but now I have to use transactions to update multiple rows.
Assuming I am looking to scale to a large size, although aware that "Facebook's problems are not my problems" - which option, or which other option is preferred?
I'd go with option A.
One interesting option might be to consider relationships table model:
Relationships table
You can now connect users.
case B follows A:
case Another user starts following A:
case Another user starts following B:
You can even eliminate unneeded rows if you wish: A starts following B: