This is a question about tradeoffs.
Imagine a social network. Each user has a status message, that he can change anytime. Whenever he does change it, all his friends are notified through a wall (like in Facebook).
To make this work. We have 3 tables Users(id, name), FriendLists(userId, friendUserId), Notifications(?).
Now let's assume each user has approximately 50 friends in his friend list. I am faced with the dilemma - how to implement Notifications table.
CREATE TABLE Notifications
(
toUserId bigint NOT NULL,
[identity] bigint IDENTITY(1,1) NOT NULL,
fromUserId bigint NOT NULL,
data varchar(256) NOT NULL,
CONSTRAINT [PK_Notifications] PRIMARY KEY CLUSTERED (toUserId, [identity])
)
Send notifications:
-- Get all friends of @fromUserId.
WITH Friends AS
(SELECT FriendLists.friendUserId
FROM FriendLists
WHERE userId = @fromUserId)
-- Send updates to all friends.
SELECT
friendUserId as toUserId,
@fromUserId as fromUserId,
@data as data
INTO Notifications
FROM Friends
In this case, for each status change we create 50 records (assuming 50 friends). This is bad. However the good point is that to retrieve notifications for a specific user it is really fast, since we have a clustered index on the toUserId.
CREATE TABLE Notifications
(
toUserId bigint NOT NULL,
[identity] bigint IDENTITY(1,1) NOT NULL,
fromUserId bigint NOT NULL,
data varchar(256) NOT NULL,
CONSTRAINT [PK_Notifications] PRIMARY KEY CLUSTERED ([identity])
)
CREATE NONCLUSTERED INDEX [IX_toUserId] ON Notifications (toUserId ASC)
Send notifications:
-- Get all friends of @fromUserId.
WITH Friends AS
(SELECT FriendLists.friendUserId
FROM FriendLists
WHERE userId = @fromUserId)
-- Send updates to all friends.
INSERT INTO Notifications(toUserId, fromUserId, data)
VALUES(friendUserId, @fromUserId, @data)
Here we only insert a single record per status update. This is good. The bad point is that the retrieval of the notifications is going to be slower, since records are not clustered by toUserId.
Getting notifications is same for both methods:
SELECT TOP(50) fromUserId, [identity], data
FROM Notifications
WHERE toUserId = @toUserId
So what is your take on this?
First, reads are always going to be overwhelming in comparison with writes, because each 'wall' will be seen many more times than it will be updated. So you better make reads darn fast.
Second, one of the problem inherent in these kind of big social networking sites is the distribution of data (sharding, partitioning, no single database will ever be capable of storing all accounts, all friends, all notifications) which means that when a new notification is put on a wall, the friends have to be notified on other servers. This implies updates are asynchronous and messaging based anyway.
So I would definitely go with a structure optimized for reading.
I'd recommend you go over the public presentations done by various people involved in the architecture of sites like Facebook and MySpace, like this Christa Stelzmuller's one. They explain a lot of the thinking and reasoning that goes into their design.