Search code examples
mysqloptimizationpivot-tableentity-attribute-value

How to Optimize/Refactor MySQL Pivot Table Performance when using Where Clause


I have two simple MySQL tables - one index table t_id, which has a unique primary id; and a pivot table t_data spreading those id's over various data fields:

CREATE TABLE `t_id` (  
  `id` bigint(12) NOT NULL AUTO_INCREMENT,  
  PRIMARY KEY (`id`)  
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;  

CREATE TABLE `t_data` (  
  `id` int(11) NOT NULL,  
  `field` varchar(50) CHARACTER SET cp1251 NOT NULL,  
  `value` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci 
DEFAULT NULL,  
  UNIQUE KEY `idxfield` (`id`,`field`),  
  KEY `value` (`value`)  
) ENGINE=InnoDB DEFAULT CHARSET=utf8;  

Here is some sample data:

+----+--------------+-------------------+  
| id | field        | value             |  
+----+--------------+-------------------+  
| 1  | organization | Apple Inc.        |  
| 1  | state        | CA                |  
| 2  | organization | Adobe Inc.        |  
| 2  | state        | CA                |  
| 3  | organization | Alphabet Inc.     |  
| 3  | state        | CA                |  
| 4  | organization | Rockwell Collins  |  
| 4  | state        | IA                |  
| 5  | organization | GEICO             |  
| 5  | state        | MD                |  
| 6  | organization | Anheuser-Busch    |  
| 6  | state        | MO                |  
| 7  | organization | Bank of America   |  
| 7  | state        | NC                |  
+----+--------------+-------------------+    

which can be reported on with a standard pivot table select query:

select  
  i.id,  
  ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',  
  ifnull (max(case when d.field = 'state' then d.value end),'') 'state'  
from `t_id` i  
left join `t_data` d  
on i.id = d.id  
group by i.id  
limit 0,10  

This simple example shows just two "virtual" fields (organization and state), with 7 unique id's:

+----+------------------+-------+  
| id | organization     | state |  
+----+------------------+-------+  
| 1  | Apple Inc.       |  CA   |  
| 2  | Adobe Inc.       |  CA   |  
| 3  | Alphabet Inc.    |  CA   |  
| 4  | Rockwell Collins |  IA   |  
| 5  | GEICO            |  MD   |  
| 6  | Anheuser-Busch   |  MO   |  
| 7  | Bank of America  |  NC   |  
+----+------------------+-------+  

In our actual production situation we have dozens of "virtual" fields (not just 2), and millions of unique id's (not just 7). The database performs quite well doing crud type queries on single id's (less than a second), and even listing one limit group at a time (again less than a second). The problem arises when an attempt is made to constrain a select with a where clause (queries are taking tens of seconds). For example, to find all the organizations in California:

select  
 x.id,  
 x.organization,  
 x.state  
from  
(  
select  
  i.id,  
  ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',  
  ifnull (max(case when d.field = 'state' then d.value end),'') 'state'  
from `t_id` i  
left join `t_data` d  
on i.id = d.id  
group by i.id  
) as x  
where x.state='CA'  
limit 0,10  


+----+---------------+-------+  
| id | organization  | state |  
+----+---------------+-------+  
| 1  | Apple Inc.    |  CA   |  
| 2  | Adobe Inc.    |  CA   |  
| 3  | Alphabet Inc. |  CA   |  
+----+---------------+-------+  

this works but it takes a LONG time (again, 10's of seconds)! What is the best practice here - is there a better way to write these types of queries? How can these pivot table queries be optimized for a where clause?


Solution

  • This should be a lot faster with large data sets. Plus it can easily be extended to any number of "virtual" fields. You can place any search criteria you may have between the %%.

    select  
        i.id,  
        coalesce(max(case when field = 'organization' then value end), '') as organization,  
        coalesce(max(case when field = 'state' then value end), '') as state  
    from t_id i  
    left join t_data d  
    on i.id = d.id  
    and i.id like '%%'  
    and i.id in (  
      select id  
      from `t_data`  
      where `field` = 'organization'   
      and `value` like '%%'  
      and  id  in (  
        select id  
        from `t_data`  
        where `field` = 'state'  
        and `value` like '%%'  
      )  
    )  
    group by i.id