There are 3 tables and I linked three tables with one query. But the problem is that the result returns in 7-8 seconds. One of the columns I join is JSON type and the data in it is json. For this reason, I did the join process with JSON. How can I make the following query faster? Or can I achieve the same result using another method?
Here is a table example for sales with json column:
And the query:
SELECT
`s`.`id` AS `id`,
`s`.`status` AS `status`,
group_concat(`c`.`display_name` separator ', ') AS `buyer`,
`u`.`display_name` AS `partner`,
`s`.`sales_price` AS `sales_price`,
FROM `sales` AS `s`
LEFT JOIN `users` AS `u` ON (`u`.`id` = `s`.`user_id`)
JOIN `contacts` AS `c` ON (json_contains(`s`.`buyers`, concat('"', `c`.`id`, '"')))
GROUP BY
`s`.`id`
My tables:
SALES
column_name | column_type |
---|---|
id | integer |
buyers | varchar(255) |
user_id | integer |
status | integer |
sales_price | double(13,2) |
CONTACTS
column_name | column_type |
---|---|
id | integer |
display_name | varchar(255) |
USERS
column_name | column_type |
---|---|
id | integer |
display_name | varchar(255) |
SALES.user_id => USERS.id SALES.buyers => CONTACTS.id
Sales.buyers can be multiple or single like this:
If the sale has one contact: ["774"] or If the sale has two contacts: ["774", "854"]
I want to see this output:
S.id | S.status | C.display_name AS BUYER | U.display_name AS PARTNER | S.sales_price |
---|---|---|---|---|
1 | 1 | Michael Owen | David Beckham | 199999 |
2 | 1 | Ariel Ortega, Mauro Icardi | Leo Messi | 219000 |
3 | 0 | Nicholas Anelka, Didier Drogba | Thierry Henry | 710000 |
EXPLAIN result is here:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | ALL | 12455638 | ||||||
2 | DERIVED | s | ALL | 857 | Using temporary; Using filesort | ||||
2 | DERIVED | u | eq_ref | PRIMARY | PRIMARY | 4 | s.user_id | 1 | |
2 | DERIVED | c | ALL | 14533 | Using where; Using join buffer (flat, BNL join) | ||||
3 | UNION | s2 | ALL | buyers_index | 857 | Using where | |||
3 | UNION | u2 | eq_ref | PRIMARY | PRIMARY | 4 | s2.user_id | 1 | |
UNION RESULT | <union2,3> | ALL |
You have correctly identified the source of your performance trouble as the arrays of contacts.id
values you store in your sales.status
columns, in your case using the JSON format. SQL truly does not work well with data designs containing arrays in columns, whether JSON, comma-separated lists, or whatever.
Specifically, your ON (json_contains(s.buyers, concat('"', c.i, '"')))
condition cannot exploit any index -- it isn't sargable -- so it's inherently slow.
It looks like you have a many-to-many relationship between rows of sales
and contacts
. The SQLish way to express that is with a junction table. You might call it sales_buyers
. It will contain rows like this.
sales_id contact_id
1 774
2 774
2 854
These sample rows mean sales.id
1 has a single buyer, contact.id
774. sales.id
2 has two buyers, 774 and 854.
Once you have that kind of junction table you can change your query to look like this:
...
FROM sales AS s
LEFT JOIN users AS u ON u.id = s.user_id
JOIN sales_buyers AS sb ON s.id = sb.sales_id
JOIN contacts AS c ON sb.contact_id = c.id
...
(Notice also that you need GROUP BY s.id, u.display_name
for your query to be correct.)
Your junction table will have this definition.
CREATE TABLE sales_buyers (
sales_id INT NOT NULL,
contact_id INT NOT NULL,
PRIMARY KEY (sales_id, contact_id),
INDEX contact_sales (contact_id, sales_id)
) ENGINE=InnoDB;
This table definition has indexes going both ways. That helps queries be faster whether you start with contacts
or sales
.
You create a sales - to - buyer relationship by INSERTing a row into this table, and remove a relationship by DELETEing the row.
Pro tip Skip the backticks. Don't use reserved words for column or table names and you don't need them. And they just make SQL harder to type and to read.