sql-server t-sql join sql-server-2005 join-hints

advantages in specifying HASH JOIN over just doing a JOIN?

What are the advantages, if any, of explicitly doing a HASH JOIN over a regular JOIN (wherein SQL Server will decide the best JOIN strategy)? Eg:

select pd.*
from profiledata pd
inner hash join profiledatavalue val on val.profiledataid=pd.id

In the simplistic sample code above, I'm specifying the JOIN strategy, whereas if I leave off the "hash" key word SQL Server will do a MERGE JOIN behind the scenes (per the "actual execution plan").

Solution

The optmiser does a good enough job for everyday use. However, in theory it might need 3 weeks to find the perfect plan in the extreme, so there is a chance that the generated plan will not be ideal.

I'd leave it alone unless you have a very complex query or huge amounts of data where it simply can't produce a good plan. Then I'd consider it.

But over time, as data changes/grows or indexes change etc, your JOIN hint will becomes obsolete and prevents an optimal plan. A JOIN hint can only optimise for that single query at the time of development with that set of data you have.

Personally, I've never specified a JOIN hint in any production code.

I've normally solved a bad join by changing my query around, adding/changing an index or breaking it up (eg load a temp table first). Or my query was just wrong, or I had an implicit data type conversion, or it highlighted a flaw in my schema etc.

I've seen other developers use them but only where they had complex views nested upon complex views and they caused later problems when they refactored.

Edit:

I had a conversion today where some colleagues are going to use them to force a bad query plan (with NOLOCK and MAXDOP 1) to "encourage" migration away from legacy complex nested views that one of their downstream system calls directly.