Search code examples
mysqlperformancecountinnodb

MySQL subquery count with calendar table slow


I have a sales table in MySQL (InnoDB). It's +- 1 million records big. I would like to show some nice charts. Fetching the right data is not a problem. Fetching it fast is...

So I like to count the amount of sales in table A grouped per day (later on also month, and year) for PERIOD A till Z. Concrete; for the last 30 days I like to know for each day how many sales records we have in the DB.

So MySQL would have to return something like this:

I like to achieve that MySQL returns the data like this:

date, count
2017-04-01, 2482
2017-04-02, 1934
2017-04-03, 2701
...

The structure of the Sales basically like this:

CREATE TABLE `sales` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `deleted_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `contacts_created_at_index` (`created_at`),
  KEY `contacts_deleted_at_index` (`deleted_at`),
  KEY `ind_created_at_deleted_at` (`created_at`,`deleted_at`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Some days (datapoints) might not have any results, but I don't like to have gaps in the data. So I also have some 'calendar' table.

CREATE TABLE `time_dimension` (
  `id` int(11) NOT NULL,
  `db_date` date NOT NULL,
  `year` int(11) NOT NULL,
  `month` int(11) NOT NULL,
  `day` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `td_ymd_idx` (`year`,`month`,`day`),
  UNIQUE KEY `td_dbdate_idx` (`db_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Fetching 30 rows (30 days) with a count per day takes 30 secs...

This is the first query I tried:

SELECT 
    `db_date` AS `date`,
    (SELECT 
            COUNT(1)
        FROM
            sales
        WHERE
            DATE(created_at) = db_date) AS count
FROM
    `time_dimension`
WHERE
    `db_date` >= '2017-04-11'
        AND `db_date` <= '2017-04-25'
ORDER BY `db_date` ASC

But like I said it's really slow (11.9 secs). I tried al kinds of other approaches, but without luck. For example:

SELECT time_dimension.db_date AS DATE,
       COUNT(1) AS count
FROM sales RIGHT JOIN time_dimension ON (DATE(sales.created_at) =         
    time_dimension.db_date)
WHERE
    (time_dimension.db_date BETWEEN '2017-03-11' AND '2017-04-11')
GROUP BY
    DATE

A query for just 1 datapoint takes only 5.4ms:

SELECT COUNT(1) FROM sales WHERE created_at BETWEEN '2017-04-11 00:00:00' AND '2017-04-25 23:59:59'

I haven't checked innodb_buffer_poolsize on my local machine. I will check that as well. Any ideas on how to make queries like this fast? In the future I would even need to where clauses and joins, to filter the set of sales records..

Thanks.

Nick


Solution

  • You could try to count sale data first, then join count result with your calendar table.

    SELECT time_dimension.db_date AS date, 
           by_date.sale_count 
    FROM   time_dimension 
           LEFT JOIN (SELECT DATE(sales.created_at) sale_date, 
                             COUNT(1)               AS sale_count 
                      FROM   sales 
                      WHERE  created_at BETWEEN '2017-03-11 00:00:00' AND 
                                                '2017-04-11  23:59:59' 
                      GROUP  BY DATE(sales.created_at)) by_date 
                  ON time_dimension.db_date = by_date.sale_date 
    WHERE  time_dimension.db_date BETWEEN '2017-03-11' AND '2017-04-11'