Search code examples
mysqldatabasegroup-concat

check if combinations (Group by timestamp) exist in mysql


Fiddle Example

I have a page that allows people to compare items from the database. The number of items for each comparison is not fixed. I'm using a two-column table to store their comparison records with timestamp as a way to group them together.

Suppose I have the product_id 304 and 308 and 306, how can I check whether they exist in the database as the same group? I'm not sure how to check the combinations of multiple product_ids, I'm looking for something like this:

Output:

product_id    name
306|308|304   D|E|C

Here's the query:

SELECT product_id,name
FROM (
   SELECT 
   GROUP_CONCAT(c.product_id SEPARATOR "|") AS product_id,
   GROUP_CONCAT(p.name SEPARATOR "|") AS name
  FROM compare c
  INNER JOIN product p ON p.product_id = c.product_id
  GROUP BY c.timestamp
  ORDER BY c.timestamp
  /* How to do a where clause here? WHERE p.product_id = 306 AND p.product_id = 308 AND p.product_id = 304 */
  )e
GROUP BY product_id


CREATE TABLE compare
    (`product_id` int,`timestamp` timestamp)
;

INSERT INTO compare
    (`product_id`,`timestamp`)
VALUES
    (300,'2015-01-12 19:04:13'),
    (302,'2015-01-12 19:04:13'),
    (304,'2015-01-12 19:06:24'),
    (306,'2015-01-12 19:06:24'),
    (308,'2015-01-12 19:06:24'),
    (310,'2015-01-12 19:08:40'),
    (312,'2015-01-12 19:08:40'),
    (314,'2015-01-12 19:08:40'),
    (316,'2015-01-12 19:08:40'),
    (302,'2015-01-12 19:10:50'),
    (316,'2015-01-12 19:10:50')

;

CREATE TABLE product
    (`product_id` int,`name` varchar(30))
;

INSERT INTO product
    (`product_id`,`name`)
VALUES
    (300,'A'),
    (302,'B'),
    (304,'C'),
    (306,'D'),
    (308,'E'),
    (310,'F'),
    (312,'G'),
    (314,'H'),
    (316,'I')

;

Solution

  • If I got your intention right you want to maintain a list of comparisons and be able to answer a question if a certain comparison took place and probably keep this list deduplicated.

    Your approach won't work.

    What you need is to have an efficient way to translate your product_ids set to some identifier. Here's one of the possible approaches:

    CREATE TABLE comparison (
        id int not null auto_increment,
        created_at timestamp default current_timestamp,
        hash varchar(16), -- or some other type depending the hash function of your choice
        primary key (id),
        key (hash)
    );
    
    CREATE TABLE comparison item (
        comparison_id int not null,
        product_id int not null,
        primary key (comparison_id, product_id)
    );
    

    When creating a new comparison (or checking if one already exists) you calculate a hash function of your product_ids set, say you sort your product_ids, concatenate them, get md5 of the result and store half of the string in hexadecimal presentation (it is shorter and still will be enough).

    If you are checking for an already stored comparison you first check if a record with a given hash exists.

    If yes, then you can fetch all rows from the second table with the corresponding comparison_id to ensure that you haven't been too lucky to encounter a collision.

    If not, that means that you have never encountered this set before.

    Using this structure you still can store timestamps when the comparison was created and check whether an individual product_id was ever used (for this you would need an additional key(product_id) in the second table).