Search code examples
sql-serversql-server-2012query-optimization

Abysmal performance - Hash join takes >80% of query cost in SQL Server 2012


This part of the query makes it quite bad, unfortunately can't see a way around, only optimize.

update #db  set contents = i.contents
from (select distinct
(select max(ac.contents) from ##dwv d
left join ##calendar c on 1=1
left join #db ac on d.id = ac.id
and c.ReportingPeriod = ac.DateValue and ac.Data_Type = 'ActivePeriod'
where d.ID = dd.id and month_number >= (cc.month_number-3)
and month_number <= cc.month_number) contents
,dd.id
,cc.ReportingPeriod
from #db dd
left join ##calendar cc on cc.ReportingPeriod = dd.DateValue
where  dd.Data_Type = 'ActivePeriod'
)i
where i.id = #db.id and i.ReportingPeriod = #dashboard.DateValue

I was trying to merge it first, but wasn't going somewhere fast, and the above puppy came to be.

The Idea is to mark every customer as active in any given period (year, month in format 'YYYYMM') according to a specific algorithm, so for every customer that matches the report criteria I need to have a row which will tell me if he was active (that is: bought something recently).

#db is a temp table where I'm gathering all the data that will be later used for aggregates to produce report - large table of several million rows, depending on timeframe:

Create table #db
(
     C_P varchar(6)
    ,Data_Type varchar(20)
    ,id int
    ,contents int
    ,DateValue varchar(10)
)

##dwv is a temp table where I'm dumping the result of a select on a large view (which itself is very slow), holds about 2.4 million rows

##calendar is an ad-hoc table which stores every period the report encompasses in same format 'YYYYMM':

select CONVERT(char(6), cast(@startdate as date), 112) "CP"
      ,CONVERT(char(6), cast(PKDate as date), 112) "RP"
      ,(ROW_NUMBER() over (order by (CONVERT(char(6), cast(PKDate as date), 112)) asc))-1
       as month_number
into ##calendar
from  [calendar].[dbo].[days]
where PKDate between @startdate and @enddate2
group by CONVERT(char(6), cast(PKDate as date), 112)

Query plan tells me that the bit c.ReportingPeriod = ac.DateValue is the cuplrit - takes 88% of the subquery cost with it, which in turns accounts for 87% of the cost of whole query.

What am I not seeing here and how can I improve that?


Solution

  • Hash Joins usually mean that the columns used in the JOIN are not indexed.

    Make sure you have covering indexes for these columns:

    d.id = ac.id and c.ReportingPeriod = ac.DateValue and ac.Data_Type