Search code examples
mysqldatabasetransactionsquery-optimization

MySQL poor performance with a large table


I have a monitoring table which holds monitoring data for some 200+ servers. Each server adds 3 records of data to the table every minute of every day.

I hold 6 months of data for historical reports for customers, and as you can imagine the table gets pretty large.

My issue currently is that running SELECT queries on this table is taking an age. I understand why; It's the sheer amount of rows its working through whilst performing the SELECT, but I have tried to reduce the result set significantly by adding in time lookups...

SELECT * FROM `host_monitoring_data` 
WHERE parent_id = 47 AND timestamp > (NOW() - INTERVAL 5 MINUTE);

... but still I'm looking at a long time before the data is returned to me.

I'm used to working with fairly small tables and this is by far the biggest that I've ever worked with, so I'm not familiar with how to overcome this sort of issue.

Any help at all is vastly appriciated.

My table structure is currently id, parent_id, timestamp, type, U, A, T

U,A,T is Used/Available/Total, Type tells me what kind of measurable we are working with, Timestamp is exactly that, parent_id is the id of the parent host to which the data belongs, and id is an auto-incrementing id for the record in question.

When I'm doing lookups, I'm basically trying to get the most recent 20 rows where parent_id = x or whatever, so I just do...

SELECT u,a,t from host_monitoring_data 
WHERE parent_id=X AND timestamp > (NOW() - INTERVAL 5 MINUTE) 
ORDER BY timestamp DESC LIMIT 20

EDIT 1 - Including the results of EXPLAIN:

EXPLAIN SELECT * FROM `host_monitoring_data` 
WHERE parent_id=36 AND timestamp > (NOW() - INTERVAL 5 MINUTE) 
ORDER BY timestamp DESC LIMIT 20

id select_type table                type possible_keys key key_len ref rows Extra
1  SIMPLE      host_monitoring_data ALL  NULL          NULL NULL   NUL 2865454
Using where; Using filesort

Solution

  • Based on your EXPLAIN report, I see it says "type: ALL" which means it's scanning all the rows (the whole table) for every query.

    You need an index to help it scan fewer rows.

    Your first condition for parent_id=X is an obvious choice. You should create an index starting with parent_id.

    The other condition on timestamp >= ... is probably the best second choice. Your index should include timestamp as the second column.

    You can create this index this way:

    ALTER TABLE host_monitoring_data ADD INDEX (parent_id, timestamp);
    

    You might like my presentation How to Design Indexes, Really and a video of me presenting it: https://www.youtube.com/watch?v=ELR7-RdU9XU

    P.S.: Please when you ask questions about query optimization, run SHOW CREATE TABLE <tablename> and include its output in your question. This shows us your columns, data types, current indexes, and constraints. Don't make us guess! Help us help you!