Search code examples
mysqldatabase-designquery-optimizationdatabase-normalization

Joining with a json column makes the result very slow


There are 3 tables and I linked three tables with one query. But the problem is that the result returns in 7-8 seconds. One of the columns I join is JSON type and the data in it is json. For this reason, I did the join process with JSON. How can I make the following query faster? Or can I achieve the same result using another method?

Here is a table example for sales with json column:

sales table example

And the query:

SELECT
    `s`.`id` AS `id`,
    `s`.`status` AS `status`,
    group_concat(`c`.`display_name` separator ', ') AS `buyer`,
    `u`.`display_name` AS `partner`,
    `s`.`sales_price` AS `sales_price`,
FROM `sales` AS `s`
LEFT JOIN `users` AS `u` ON (`u`.`id` = `s`.`user_id`)
JOIN `contacts` AS `c` ON (json_contains(`s`.`buyers`, concat('"', `c`.`id`, '"')))
GROUP BY
    `s`.`id`

My tables:

SALES

column_name column_type
id integer
buyers varchar(255)
user_id integer
status integer
sales_price double(13,2)

CONTACTS

column_name column_type
id integer
display_name varchar(255)

USERS

column_name column_type
id integer
display_name varchar(255)

SALES.user_id => USERS.id SALES.buyers => CONTACTS.id

Sales.buyers can be multiple or single like this:

If the sale has one contact: ["774"] or If the sale has two contacts: ["774", "854"]

I want to see this output:

S.id S.status C.display_name AS BUYER U.display_name AS PARTNER S.sales_price
1 1 Michael Owen David Beckham 199999
2 1 Ariel Ortega, Mauro Icardi Leo Messi 219000
3 0 Nicholas Anelka, Didier Drogba Thierry Henry 710000

EXPLAIN result is here:

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY ALL 12455638
2 DERIVED s ALL 857 Using temporary; Using filesort
2 DERIVED u eq_ref PRIMARY PRIMARY 4 s.user_id 1
2 DERIVED c ALL 14533 Using where; Using join buffer (flat, BNL join)
3 UNION s2 ALL buyers_index 857 Using where
3 UNION u2 eq_ref PRIMARY PRIMARY 4 s2.user_id 1
UNION RESULT <union2,3> ALL

Solution

  • You have correctly identified the source of your performance trouble as the arrays of contacts.id values you store in your sales.status columns, in your case using the JSON format. SQL truly does not work well with data designs containing arrays in columns, whether JSON, comma-separated lists, or whatever.

    Specifically, your ON (json_contains(s.buyers, concat('"', c.i, '"'))) condition cannot exploit any index -- it isn't sargable -- so it's inherently slow.

    It looks like you have a many-to-many relationship between rows of sales and contacts. The SQLish way to express that is with a junction table. You might call it sales_buyers. It will contain rows like this.

    sales_id   contact_id
       1       774
       2       774
       2       854
    

    These sample rows mean sales.id 1 has a single buyer, contact.id 774. sales.id 2 has two buyers, 774 and 854.

    Once you have that kind of junction table you can change your query to look like this:

    ...
    FROM sales AS s
    LEFT JOIN users AS u    ON u.id = s.user_id
    JOIN sales_buyers AS sb ON s.id = sb.sales_id
    JOIN contacts AS c      ON sb.contact_id = c.id 
    ...
    

    (Notice also that you need GROUP BY s.id, u.display_name for your query to be correct.)

    Your junction table will have this definition.

    CREATE TABLE sales_buyers (
        sales_id INT NOT NULL,
        contact_id INT NOT NULL,
        PRIMARY KEY (sales_id, contact_id),
        INDEX contact_sales (contact_id, sales_id)
    ) ENGINE=InnoDB;
    

    This table definition has indexes going both ways. That helps queries be faster whether you start with contacts or sales.

    You create a sales - to - buyer relationship by INSERTing a row into this table, and remove a relationship by DELETEing the row.

    Pro tip Skip the backticks. Don't use reserved words for column or table names and you don't need them. And they just make SQL harder to type and to read.