Search code examples
sqlsql-serverquery-optimizationaggregate-functions

Avoid Multiple subqueries in SQL with the same conditions


I have a DB of survey data that looks like this:

create table Products(
    Id int primary key,
    Name nvarchar(max)
)

create table Opinions(
    Id int primary key,
    ProductId int foreign key references Products(Id),
    IsEasyToUse bit default((0)),
    IsGoodPrice bit default((0)),
    IsDurable bit default((0)),
-- and so on...
)

I want to create a report that shows every product and a percentage of all the opinion columns from the Opinions table.

So far, I have been doing it the naive way:

select 
    p.Name,
    1.0 *
        (select count(*) from Opinions o where o.ProductId = p.Id and o.IsEasyToUse = 1) / 
        (select count(*) from Opinions o where o.ProductId = p.Id) as EasyToUse,
    1.0 *
        (select count(*) from Opinions o where o.ProductId = p.Id and o.IsGoodPrice = 1) /
        (select count(*) from Opinions o where o.ProductId = p.Id) as GoodPrice,
    1.0 *
        (select count(*) from Opinions o where o.ProductId = p.Id and o.IsDurable = 1) /
        (select count(*) from Opinions o where o.ProductId = p.Id) as Durable
from
(select Id, Name from Products) p

This technically works, but I am not happy with the repetition of the conditions. Also the Opinions table is growing in size so the performance issues are starting to become noticeable (it now takes around a minute with 1m rows).

Is there a way to rewrite it in a way that's faster (and easier to modify in the future in case I want to apply more filters)?


Solution

  • As a starter: we can join and aggregate instead of executing multiples correlated subqueries.

    Then: we don't need to compute the total count; instead, we can just juse avg(); less calls to aggregate functions should improve the efficiency of the query:

    select p.id, p.name
        avg(case when IsEasyToUse = 1 then 1.0 else 0 end) as EasyToUse,
        avg(case when IsGoodPrice = 1 then 1.0 else 0 end) as GoodPrice,
        avg(case when IsDurable   = 1 then 1.0 else 0 end) as Durable 
    from products p
    inner join opinions o on o.productID = p.id
    group by p.id, p.name
    

    Depending on how your data is spread across the two tables, apply might offer a good alternative, since it avoids outer aggregation:

    select p.id, p.name, x.*
    from products p
    cross apply (
        select 
            avg(case when IsEasyToUse = 1 then 1.0 else 0 end) as EasyToUse,
            avg(case when IsGoodPrice = 1 then 1.0 else 0 end) as GoodPrice,
            avg(case when IsDurable   = 1 then 1.0 else 0 end) as Durable 
        from opinions o 
        where o.productID = p.id
    ) x