I have an MySQL InnoDB table with 5.7M rows and 1.9GB size:
+-------------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------+------+-----+---------+----------------+
| id | int(20) | NO | PRI | NULL | auto_increment |
| listing_id | int(20) | YES | | NULL | |
| listing_link | text | YES | | NULL | |
| transaction_title | text | YES | | NULL | |
| image_thumb | text | YES | | NULL | |
| seller_link | text | YES | | NULL | |
| seller_name | text | YES | | NULL | |
| sale_date | date | YES | | NULL | |
+-------------------+---------+------+-----+---------+----------------+
Here are my my.ini settings for my 3GB RAM server:
key_buffer = 16M
max_allowed_packet = 16M
sort_buffer_size = 8M
net_buffer_length = 8K
read_buffer_size = 2M
read_rnd_buffer_size = 16M
myisam_sort_buffer_size = 8M
log_error = "mysql_error.log"
innodb_autoinc_lock_mode=0
join_buffer_size = 8M
thread_cache_size = 8
thread_concurrency = 8
query_cache_size = 64M
query_cache_limit = 2M
ft_min_word_len = 4
thread_stack = 192K
tmp_table_size = 64M
innodb_buffer_pool_size = 2G
innodb_additional_mem_pool_size = 16M
innodb_log_file_size = 512M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 120
innodb_write_io_threads = 8
innodb_read_io_threads = 8
innodb_thread_concurrency = 16
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
When I run next query it takes over 20 minutes to return the results:
SELECT transaction_title,
listing_id,
seller_name,
Max(sale_date) AS sale_date,
Count(*) AS count
FROM sales_meta
WHERE `sale_date` BETWEEN '2017-06-06' AND '2017-06-06'
GROUP BY listing_id
HAVING Count(*) > 1
ORDER BY count DESC,
seller_name;
I've done some research and it appears I need to add some indexes to speed things up, but I am confused how to go about it. There are some single-column indexes and some multi-column indexes, which one should I do?
To make things more complicated, there are few other queries that I will need to perform on this table regularly:
SELECT *
FROM sales_meta
WHERE ` sale_date `= '2017-06-06';
and
SELECT DISTINCT `seller_name`
FROM `sales_meta`;
These two are probably less taxing, but I still need to optimize for them as well if possible, although the first query out of three is the top priority for now.
if you want just the value for a single day and the the data type is date then you could avoid the between clause and use =
SELECT transaction_title,
listing_id,
seller_name,
Max(sale_date) AS max_sale_date,
Count(*) AS count
FROM sales_meta
WHERE sale_date = str_to_date('2017-06-06', '%Y-%m-%d')
GROUP BY listing_id
HAVING Count(*) > 1
ORDER BY count DESC, seller_name;
and be sure you have an index on sale_date