Search code examples
sql-servert-sqloptimizationsubqueryquery-optimization

TSQL Combine multiple sub queries on same table


I've been trying to improve a SQL query which uses multiple sub queries over the same table but with different conditions and only retrieves the first result from each sub queries.

I will try to simplify the use-case :

I have a table Products like this:

Product_id reference field3 field 4
1 ref1 val1 val3
2 ref2 val2 val4

And another table History:

History_id reference utilcode physicalcode issue media datetime
1 ref1 'test' 'TST' '0' '&audio' 'a_date'
2 ref2 'phone' 'CALLER' '1' '&video' 'a_date'
3 ref2 'test' 'CALLER' '2' '&test' 'a_date'

History is a log table and therefore contains a lot of values.

Now I have a query like this

SELECT 
    p.reference,
    p.field3, p.field4,
    (SELECT TOP 1 a_date 
     FROM history h 
     WHERE h.reference = p.reference 
       AND physicalcode = 'TST' 
       AND issue = 0 
     ORDER BY a_date DESC) AS latest_date_issue_0,
    (SELECT TOP 1 a_date 
     FROM history h 
     WHERE h.reference = p.reference 
       AND physicalcode = 'TST' 
       AND issue = 1 
     ORDER BY a_date DESC) AS latest_date_issue_1
    (SELECT TOP 1 a_date 
     FROM history h 
     WHERE h.reference = p.reference 
       AND utilcode = 'phone' 
     ORDER BY a_date DESC) AS latest_date_phone,
    (SELECT TOP 1 media 
     FROM history h 
     WHERE h.reference = p.reference 
       AND utilcode = 'phone' 
     ORDER BY a_date DESC) AS latest_media,
    -- and so on with many possible combinations
    -- Note that there are more than this few fields on the tables I work on.
WHERE
    p.field3 = 'valX',
    p.field4 = 'valY'
FROM
    products p

How could I merge every sub selects ? Or even a few that are alike to improve the performance ?

History being a very big table, selecting over it multiple times drastically slows down the query.

The main problem being that I only need the first value every time.

Thank you for your time and I hope to find a better way to deal with this issue!

I tried to use ROW_NUMBER() but I could not find a suitable way to use it.
I also tried to create a tmp table using WITH to group every possibility from history but it was worse.

EDIT : Execution plan https://www.brentozar.com/pastetheplan/?id=Sy1AKIsUs


Solution

  • Thanks to @O.Jones I've been able to find a way to improve this query.

    What I did to merge a few requests was to use a CTE like so :

    From

    SELECT
    (SELECT TOP 1 a_date 
         FROM history h 
         WHERE h.reference = p.reference 
           AND physicalcode = 'TST' 
           AND issue = 0 
         ORDER BY a_date DESC) AS latest_date_issue_0,
        (SELECT TOP 1 a_date 
         FROM history h 
         WHERE h.reference = p.reference 
           AND physicalcode = 'TST' 
           AND issue = 1 
         ORDER BY a_date DESC) AS latest_date_issue_1
         (SELECT top 1 a_date 
          FROM history h 
          WHERE h.reference = p.reference 
            AND h.physicalcode = 'TSTKO' 
          ORDER BY h.d_systeme DESC ) AS d_tst_ko,
         (SELECT top 1 a_date 
           FROM history h 
           WHERE h.reference = p.reference 
             AND h.physicalcode = 'CALLERID' 
           ORDER BY h.d_systeme DESC ) AS d_wrong_number
    FROM products p
    

    To

    WITH physicalcode_cte (reference, physicalcode, issue, a_date)  as
        (
            SELECT reference, physicalcode, issue, max(a_date)
            from historique
            where codephysique in ('TST','TSTKO','CALLERID')
            and a_date > dateadd(month, -4, getdate()) -- filter on date range to reduce number of rows
            group by reference, physicalcode, issue
        )
    SELECT
    date_issue_0.a_date,
    date_issue_1.a_date,
    tst_ko.a_date,
    wrong_number.a_date
    FROM products p
    LEFT JOIN physicalcode_cte date_issue_0 on p.reference = date_issue_0.reference 
            AND date_issue_0.codephysique = 'TST' 
            AND date_issue_0.anomalie = 0
    LEFT JOIN physicalcode_cte date_issue_1 on p.reference = date_issue_1.reference
            AND date_issue_1.codephysique = 'TST' 
            AND date_issue_1.anomalie = 1
    LEFT JOIN physicalcode_cte tst_ko on p.reference = tst_ko.reference
            AND tst_ko.codephysique = 'TST' 
    LEFT JOIN physicalcode_cte wrong_number on p.reference = wrong_number.reference AND 
            AND wrong_number.codephysique = 'TST' 
    

    I've applied this idea for different scenarii and made 2 CTE. I couldn't merge everything, sometime merging caused cost increase. But after several tests I've been able to go from 7100 total cost to 2100.
    It is still a lot but 3 times less anyway. Takes 5 seconds instead of a timeout.
    It's a query used for monthly reports so I don't need it to be super fast, I will keep it that way.

    Thanks you!