Search code examples
mysqlexplain

MySQL GROUP BY with Using Temporary unnecessarily?


I am trying to optimize a query. Using EXPLAIN tells me it is Using temporary. This is really inefficient given the size of the table (20m+ records). Looking at the MySQL documentation Internal Temporary Tables I don't see anything that would imply the need for a Temporary table in my query. I also tried setting the ORDER BY to the same as the GROUP BY, but still says Using Temporary and query takes forever to run. I am using MySQL 5.7.

Is there a way to avoid using a temporary table for this query:

SELECT url,count(*) as sum 
FROM `digital_pageviews` as `dp` 
WHERE `publisher_uuid` = '8b83120e-3e19-4c34-8556-7b710bd7b812' 
GROUP BY url 
ORDER BY NULL;

This is my table schema:

create table digital_pageviews
(
  id             int unsigned auto_increment
    primary key,
  visitor_uuid   char(36)            null,
  publisher_uuid char(36) default '' not null,
  property_uuid  char(36)            null,
  ip_address     char(15)            not null,
  referrer       text                null,
  url_delete     text                null,
  url            varchar(255)        null,
  url_tmp        varchar(255)        null,
  meta           text                null,
  date_created   timestamp           not null,
  date_updated   timestamp           null
)
  collate = utf8_unicode_ci;

create index digital_pageviews_url_index
  on digital_pageviews (url);

create index ndx_date_created
  on digital_pageviews (date_created);

create index ndx_property_uuid
  on digital_pageviews (property_uuid);

create index ndx_publisher_uuid
  on digital_pageviews (publisher_uuid);

create index ndx_visitor_uuid_page
  on digital_pageviews (visitor_uuid);


Solution

  • The reason it needs a temporary table is that it cannot both filter by publisher_uuid and sort on a column without an index to do both. The first step is to filter by publisher_uuid, so it uses the index on publisher_uuid.

    However, next it has to group by and order the records, which will require a temporary table because it cannot use an index which will do this. The reason it cannot use an index is that it already used the publisher_uuid, which is not indexed on the url field to do the group by or on the field you are ordering by.

    To filter where publisher_uuid = '8b83120e-3e19-4c34-8556-7b710bd7b812', group by url, and order by url, create an index with these fields in this order:

    • publisher_uuid
    • url
    create index ndx_publisher_uuid
      on digital_pageviews (publisher_uuid, url);