I have a user feed of image posts. Each user can post single images, however, he can repeat the action often. Say, upload several images within an hour.
How do I effectively design the database table so when a user posts multiple images (one by one) within one hour — I can easily group those serial posts together, eigher on INSERT or on SELECT?
Don't suggest multi-upload form. That's not the case: I've just described the task in more common terms :)
That's out playground:
CREATE TABLE `feed`(
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`tm` INT UNSIGNED NOT NULL COMMENT 'timestamp',
`user_id` INT UNSIGNED NOT NULL COMMENT 'author id',
`image` VARCHAR(255) NOT NULL COMMENT 'posted image filename',
`group` INT UNSIGNED NULL DEFAULT NULL COMMENT 'post group',
PRIMARY KEY(`id`),
INDEX(`user_id`),
INDEX(`tm`,`group`)
);
We'd like to group together posts that are temporally close.
First, declare the wanted granularity: the threshold to temporal proximity:
SET @granularity:=60*60;
Each row forms a group with group ID matching the row id (it can also be a timestamp):
SELECT `g`.`id` AS `group`
FROM `feed` `g`;
Each group contains rows that originate from the same user, were posted earlier than the group-former:
SELECT `g`.`id` AS `group`, `f`.*
FROM `feed` `g`
CROSS JOIN `feed` `f`
ON (`f`.`user_id` = `g`.`user_id`
AND `f`.`tm` BETWEEN `g`.`tm`-@granularity AND `g`.`tm`
)
Each row belongs to multiple groups. For each row, we pick the most 'broad' group: it has the biggest rowId
SELECT MAX(`g`.`id`) AS `group`, `f`.*
FROM `feed` `g`
CROSS JOIN `feed` `f`
ON (`f`.`user_id` = `g`.`user_id`
AND `f`.`tm` BETWEEN `g`.`tm`-@granularity AND `g`.`tm`
)
GROUP BY `f`.`id`
The most recently updated group always jumps to the top (if you sort by group
DESC).
However, if you'd like the groups be persistent (e.g. so items don't move from one group to another), use MIN
instead of MAX
:
SELECT MIN(`g`.`id`) AS `group`, `f`.*
FROM `feed` `g`
CROSS JOIN `feed` `f`
ON (`f`.`user_id` = `g`.`user_id`
AND `f`.`tm` BETWEEN `g`.`tm` AND `g`.`tm`+@granularity
)
GROUP BY `f`.`id`
Now, we're going to update the table's group
column.
First, MySQL can't update the same table you're reading from. Wee need a temporary table.
Second: we only update the rows whose group
column is NULL, or rows posted later than UNIX_TIMESTAMP()-2*@threshold
:
CREATE TEMPORARY TABLE `_feedg`
SELECT MAX(`g`.`id`) AS `group`, `f`.`id`
FROM `feed` `g`
CROSS JOIN `feed` `f`
ON (`f`.`user_id` = `g`.`user_id`
AND `f`.`tm` BETWEEN `g`.`tm`-@granularity AND `g`.`tm`
)
WHERE `f`.`group` IS NULL
OR `f`.`tm` >= (UNIX_TIMESTAMP()-2*@granularity)
GROUP BY `f`.`id`;
And update the group
column:
UPDATE `feed` `f` CROSS JOIN `_feedg` `g` USING(`id`)
SET `f`.`group` = `g`.`group`;
Here's the SQLFiddle: http://sqlfiddle.com/#!2/be9ce/15