I'm building an "Ads" Website based on Laravel as an API where it contains a dynamic-advanced search.
Ads can have multiple attributes and the user can have advanced search, but it leads to so many sub-queries if user checked so many attributes to filter by and the query become slow, Any advice about better query than this one.
select
*
from
`ads`
where
`ads`.`id` in (
select
`ad_attributes`.`ad_id`
from
`ad_attributes`
where
`attribute_id` = 32
and `value` in ('odio', 'quos', 'dignissimos', 'dolorem', 'qui')
)
and `ads`.`id` in (
select
`ad_attributes`.`ad_id`
from
`ad_attributes`
where
`attribute_id` = 171
and `value` in ('itaque', 'non', 'dolor', 'laborum')
)
and `ads`.`id` in (
select
`ad_attributes`.`ad_id`
from
`ad_attributes`
where
`attribute_id` = 111
and `value` in ('quia', 'non', 'nam', 'molestias')
)
and `ads`.`id` in (
select
`ad_attributes`.`ad_id`
from
`ad_attributes`
where
`attribute_id` = 144
and `value` in ('delectus', 'nam', 'exercitationem', 'sit')
)
and `ads`.`id` in (
select
`ad_attributes`.`ad_id`
from
`ad_attributes`
where
`attribute_id` = 160
and `value` in ('repellat', 'fugit', 'quaerat', 'vero')
)
and `ads`.`id` in (
select
`ad_attributes`.`ad_id`
from
`ad_attributes`
where
`attribute_id` = 176
and `value` in ('mollitia', 'voluptates', 'maxime', 'culpa')
)
and `ads`.`id` in (
select
`ad_attributes`.`ad_id`
from
`ad_attributes`
where
`attribute_id` = 177
and `value` in ('necessitatibus', 'id')
)
and here's the code for the search, PS. I'm using a mpyw/eloquent-has-by-non-dependent-subquery instead of using whereHas because its even slower....
->when(!empty($search->attrs), function (Builder $query) use ($search) {
foreach ($search->attrs as $key => $value) {
if (!is_null($value)) {
$query->hasByNonDependentSubquery('adAttributes', function (Builder $q) use ($value, $key, $search) {
$q->where('attribute_id', $key)
->when(is_array($value), fn($q) => $q->whereIn('value', $value))
->when(!is_array($value), fn($q) => $q->where('value', $value));
});
}
}
return $query;
});
I added Some of the data in my database, I cant share all of it because its dummy seeded data thats around 60k on Fiddle [https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=6f00cc71fff716837906c22d46b0c899][1]
This may be absolutely useless and perform really badly but I vaguely remember using something along these lines when battling an EAV performance issue many years ago. I do not have a suitable test dataset to test this against so I may be making a complete fool of myself by suggesting this.
select
*
from
`ads`
where
`ads`.`id` in (
select
`ad_attributes`.`ad_id`
from
`ad_attributes`
where
(`attribute_id` = 32 and `value` in ('odio', 'quos', 'dignissimos', 'dolorem', 'qui')) OR
(`attribute_id` = 171 and `value` in ('itaque', 'non', 'dolor', 'laborum')) OR
(`attribute_id` = 111 and `value` in ('quia', 'non', 'nam', 'molestias')) OR
(`attribute_id` = 144 and `value` in ('delectus', 'nam', 'exercitationem', 'sit')) OR
(`attribute_id` = 160 and `value` in ('repellat', 'fugit', 'quaerat', 'vero')) OR
(`attribute_id` = 176 and `value` in ('mollitia', 'voluptates', 'maxime', 'culpa')) OR
(`attribute_id` = 177 and `value` in ('necessitatibus', 'id'))
group by `ad_attributes`.`ad_id`
having count(`attribute_id`) = 7
)
EDIT
As stated in my first comment below - You may need to COUNT(DISTINCT attribute_id), depending on whether ads can have multiple rows for the same attribute_id
. As your test dataset does have multiple rows for the same ad_id, attribute_id pair, you need to add DISTINCT
.
I have added to your original SQL Fiddle and your db<>fiddle. Unfortunately db<>fiddle does not return the execution time for each query.
Based on the tables in both your fiddles, there are currently no indices on these tables. You also have a needless surrogate primary key on your ad_attributes table instead of the natural key on either (ad_id, attribute_id) or (ad_id, attribute_id, value). The second version is required if you allow/need multiple rows of the same attribute_id per ad_id, as it currently the case in your test data. Is this intended or just an error in the way your test data was created?
Of your 776 test rows in ad_attributes, there are 106 sets of (ad_id, attribute_id, attribute_option_id, value) with two or more copies, and the worst has six copies. I would suggest that your test data is not useful as it does not follow the basic rules and constraints that will be needed in your production dataset. Testing queries with totally random data is not particularly useful. Your test data should attempt to model what you are likely to see in production data.
I am not totally confident that I understand the intent of your data structure. What is the thinking behind having attribute_option_id and value columns? What is the thinking behind text and option attributes? Do you really ever need a text attribute? Think about this last one very carefully as it has the potential to have a very significant impact on data quality and performance as your dataset grows.