Search code examples
selectsql-server-2008r2-expresssubquery

Nested select statement in FROM clause? Inner Join statements? or just table name?


I'm building a query that needs data from 5 tables. I've been told by a DBA in the past that specifying a list of columns vs getting all columns (*) is preferred from some performance/memory aspect. I've also been told that the database performs a JOIN operation behind the scenes when there's a list of tables in the FROM clause, to create one table (or view).

The existing database has very little data at the moment, as we're at a very initial point. So not sure I can measure the performance hit in practice. I am not a database pro. I can get what data I need. The dillema is, at what price.

Added: At the moment I'm working with MS SQL Server 2008 R2.

My questions are:

  1. Is there a performance difference and why, between the following: a. SELECT ... FROM tbl1, tbl2, tbl3 etc for simplicity? (somehow I feel that this might be a performance hit) b. SELECT ... FROM tbl1 inner join tbl2 on ... inner join tbl3 on ... etc (would this be more explicit to the server and save on performance/memory)? c. SELECT ... FROM (select x,y,z from tbl1) as t1 inner join ... etc (would this save anythig? or is it just extra select statements that create more work for the server and for us)?

  2. Is there yet a better way to do this?

Below are two queries that both get the slice of data that I need. One includes more nested select statements.

I apologize if they are not written in a standard form or helplessly overcomplicated - hopefully you can decipher. I try to keep them organized as much as possible.

Insights would be most appreciated as well. Thanks for checking this out.

5 tables: devicepool, users, trips, TripTracker, and order

Query 1 (more select statements):

SELECT
      username, 
      base.devid devid, 
      tripstatus, 
      stops, 
      stopnumber, 
      [time], 
      [orderstatus], 
      [destaddress]

FROM    
((
 (  SELECT 
           username, 
           devicepool.devid devid,
           groupid
    FROM
           devicepool INNER JOIN users 
           ON devicepool.userid = users.userid 

     WHERE devicepool.groupid = 1
 ) 
 AS [base] 

 INNER JOIN

 (
      SELECT 
              tripid, 
              [status] tripstatus, 
              stops, 
              devid,
              groupid

      FROM 
              trips 
 ) 
 AS [base2]

 ON base.devid = base2.devid AND base2.groupid = base.groupid

 INNER JOIN

 (
     SELECT 
            stopnumber, 
            devid, 
            [time], 
            MAX([time]) OVER (PARTITION BY devid) latesttime 
     FROM 
            TripTracker
 ) 

 AS [tracker] 
 ON tracker.devid = base.devid AND [time] = latesttime)

 INNER JOIN

 (
      SELECT 
            [status] [orderstatus], 
            [address] [destaddress], 
            [tripid], 
            stopnumber orderstopnumber 
      FROM [order]
 ) 
 AS [orders] 

 ON orders.orderstopnumber = tracker.stopnumber)

Query 2:

SELECT
      username, 
      base.devid devid, 
      tripstatus, 
      stops, 
      stopnumber, 
      [time], 
      [orderstatus], 
      [destaddress]

FROM    
((
 (  SELECT 
           username, 
           devicepool.devid devid,
           groupid
    FROM
           devicepool INNER JOIN users 
           ON devicepool.userid = users.userid 

     WHERE devicepool.groupid = 1
 ) 
 AS [base] 

 INNER JOIN

     trips

 ON base.devid = trips.devid AND trips.groupid = base.groupid

 INNER JOIN

 (
     SELECT 
            stopnumber, 
            devid, 
            [time], 
            MAX([time]) OVER (PARTITION BY devid) latesttime 
     FROM 
            TripTracker
 ) 

 AS [tracker] 
 ON tracker.devid = base.devid AND [time] = latesttime)

 INNER JOIN

    [order]

 ON [order].stopnumber = tracker.stopnumber)

Solution

  • Is there a performance difference and why, between the following: a. SELECT ... FROM tbl1, tbl2, tbl3 etc for simplicity? (somehow I feel that this might be a performance hit) b. SELECT ... FROM tbl1 inner join tbl2 on ... inner join tbl3 on ... etc (would this be more explicit to the server and save on performance/memory)? c. SELECT ... FROM (select x,y,z from tbl1) as t1 inner join ... etc (would this save anythig? or is it just extra select statements that create more work for the server and for us)?

    a) and b) should result in the same query plan (although this is db-specific). b) is much preferred for portability and readability over a). c) is a horrible idea, that hurts readability and if anything will result in worse peformance. Let us never speak of it again.

    Is there yet a better way to do this?

    b) is the standard approach. In general, writing the plainest ANSI SQL will result in the best performance, as it allows the query parser to easily understand what you are trying to do. Trying to outsmart the compiler with tricks may work in a given situation, but does not mean that it will still work when the cardinality or amount of data changes, or the database engine is upgraded. So, avoid doing that unless you are absolutely forced to.