Search code examples
mysqlfilterrelational

MySQL filter query with relation


I'm having the following problem with 2 MySQL tables that have a relation: I can easily query table 1 (address) when I want a full list or filter the result by name or email or such. But now I need to query table 1 and filter it based on the relational content of table 2 (interests). So, I need to find a row (usually many rows) in table 1 only if a (or more) conditions are met in table 2.

Here are the tables:

CREATE TABLE IF NOT EXISTS `address` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  `email` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  `countryCode` char(2) COLLATE utf8_unicode_ci DEFAULT NULL,
  `languageCode` char(2) COLLATE utf8_unicode_ci DEFAULT NULL,
  `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `emailUnique` (`email`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

INSERT INTO `address` (`id`, `name`, `email`, `countryCode`, `languageCode`, `timestamp`) VALUES
(1, '', 'dummy@test.com', 'BE', 'nl', '2010-07-16 14:07:00'),
(2, '', 'test@somewhere.com', 'BE', 'fr', '2010-07-16 14:10:25');

CREATE TABLE IF NOT EXISTS `interests` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `address_id` int(11) unsigned NOT NULL,
  `cat` char(2) COLLATE utf8_unicode_ci NOT NULL,
  `subcat` char(2) COLLATE utf8_unicode_ci NOT NULL,
  `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `address_id` (`address_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

INSERT INTO `interests` (`id`, `address_id`, `cat`, `subcat`, `timestamp`) VALUES
(1, 1, 'aa', 'xx', '2010-07-16 14:07:00'),
(2, 1, 'aa', 'yy', '2010-07-16 14:07:00'),
(3, 2, 'aa', 'xx', '2010-07-16 14:07:00'),
(4, 2, 'bb', 'zz', '2010-07-16 14:07:00')
(5, 2, 'aa', 'yy', '2010-07-16 14:07:00');

ALTER TABLE `interests`
  ADD CONSTRAINT `interests_ibfk_1` FOREIGN KEY (`address_id`) REFERENCES `address` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;

For example, I need to find the address(es) that has (have) as interest cat=aa and subcat=xx. Or, another example, I need the address(es) with as interest both cat=aa and subcat=xx AND cat=aa and subcat=yy. Specially the latter is important and one has to keep in mind that both the address and the interest tables will be long lists and that the amount of cat/subcat combinations will vary. I'm working with reference queries through Zend_Db_Table (findDependentRowset) at the moment but that solution is way to slow for address lists numbering 100s and even 1000s of hits.

Thank you for your help.


Solution

  • I added another row in your interests table, to demonstrate a different result set between the two examples:

    INSERT INTO interests VALUES (6, 2, 'aa', 'vv', '2010-07-16 14:07:00');
    

    Then you may want to try using correlated subqueries as follows:

    SELECT * 
    FROM   address a 
    WHERE  EXISTS (SELECT id 
                   FROM   interests 
                   WHERE  address_id = a.id AND 
                          (cat = 'aa' and subcat = 'xx'));
    

    Result:

    +----+------+--------------------+-------------+--------------+---------------------+
    | id | name | email              | countryCode | languageCode | timestamp           |
    +----+------+--------------------+-------------+--------------+---------------------+
    |  1 |      | dummy@test.com     | BE          | nl           | 2010-07-16 14:07:00 |
    |  2 |      | test@somewhere.com | BE          | fr           | 2010-07-16 14:10:25 |
    +----+------+--------------------+-------------+--------------+---------------------+
    2 rows in set (0.00 sec)
    

    For the second example, we're testing for the new row added previously in order not to have the same result as above:

    SELECT * 
    FROM   address a 
    WHERE  EXISTS (SELECT id 
                   FROM   interests 
                   WHERE  address_id = a.id AND 
                          (cat = 'aa' and subcat = 'xx')) AND
           EXISTS (SELECT id 
                   FROM   interests 
                   WHERE  address_id = a.id AND 
                          (cat = 'aa' and subcat = 'vv'));
    

    Result:

    +----+------+--------------------+-------------+--------------+---------------------+
    | id | name | email              | countryCode | languageCode | timestamp           |
    +----+------+--------------------+-------------+--------------+---------------------+
    |  2 |      | test@somewhere.com | BE          | fr           | 2010-07-16 14:10:25 |
    +----+------+--------------------+-------------+--------------+---------------------+
    1 row in set (0.00 sec)
    

    Using correlated subqueries is easy and straightforward. However keep in mind that it might not be the best in terms of performance, because the correlated subqueries will be executed once for each address in the outer query.