I have the following table structure:
+----------+------------------------+------+-----+------------------------------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------------+------+-----+------------------------------------------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| body | varchar(200) | NO | | Hey now! | |
| flags | int(10) unsigned | NO | | 0 | |
| views | int(10) unsigned | NO | | 1 | |
+----------+------------------------+------+-----+------------------------------------------+----------------+
and I want to select "only rows that have not been flagged more than 5% of their total views, and have been viewed at least 5 times."
Here is my query:
SELECT id,body
FROM tablename
WHERE id NOT IN (
SELECT id
FROM tablename
WHERE flags/views * 100 > 5.0
AND views > 5
ORDER BY id DESC
)
ORDER BY id DESC
LIMIT 6
I'm thinking that selecting "every single row that has been flagged more than 30% of total views" is going to be a ton of overhead, especially when the table grows to a very large number of rows. Could someone please help me optimize this?
I was also thinking of creating a "flag_score" column and just updating that each time something is flagged, that way I could just select on the flag_score column instead of doing the math within the select (and save myself the extra select query). Does that sound like a good approach? Thanks a bunch.
Edit: The other problem I was having is that if I simply do something like:
SELECT *
FROM tabelname
WHERE flags/views * 100 > 5.0
AND views > 5
ORDER BY id DESC
LIMIT 5
... if 4 out of the 5 posts have been flagged, it will only return 1 row! And I'd like the statement to return 5 rows.
The math is not very expensive, so as long as you have less than, say, 100 000 entries, you can do it like this with no worry.
As suggested by yourself you can always cache the condition:
UPDATE tablename
SET is_over_30_percent = (flags/views * 100 > 5.0)
WHERE id='id_of_updated_entry'
which has the advantage that you can put an index on *is_over_30_percent* so the query takes no performance at all.
Remember to put a combined index on *id_of_updated_entry* and date and id so the index can be used for selecting and ordering (ORDER BY is expensive).