Search code examples
sqlsql-serverjoinquery-optimization

Improve a query consisting multiple subqueries using joins


server_details

hostname category active
server1 small yes
server2 big yes

server_stat

hostname metric value lastupdated
server1 cpu 10 03:30:37
server1 ram 25 03:30:37
server1 disk 57 03:30:37
server2 cpu 23 03:30:37
server2 ram 55 03:30:37
server2 disk 33 03:30:37
server1 cpu 13 03:25:37
server1 ram 25 03:25:37
server1 disk 43 03:25:37
server2 cpu 83 03:25:37
server2 ram 95 03:25:37
server2 disk 63 03:25:37

Result should look like with latest lastupdated value

hostname category active cpu ram disk
server1 small yes 10 25 57
server2 big yes 23 55 33

I am using SQL Server database I wrote the below query and it is giving desired output

SELECT hd.hostname, hd.category, hd.active,
       (SELECT top 1 hs_cpu.value 
        FROM server_stat hs_cpu 
        WHERE hs_cpu.hostname = hd.hostname AND hs_cpu.metric = 'cpu' 
        ORDER BY hs_cpu.lastupdated DESC) AS cpu,
       (SELECT top 1 hs_ram.value 
        FROM server_stat hs_ram 
        WHERE hs_ram.hostname = hd.hostname AND hs_ram.metric = 'ram' 
        ORDER BY hs_ram.lastupdated DESC) AS ram,
       (SELECT top 1 hs_disk.value 
        FROM server_stat hs_disk 
        WHERE hs_disk.hostname = hd.hostname AND hs_disk.metric = 'disk' 
        ORDER BY hs_disk.lastupdated DESC) AS disk
FROM server_details hd;

Can we optimize this query using joins without subqueries?


Solution

  • I'm not sure if this would actually be any faster but it is a slightly more set-oriented approach, using PIVOT and joining back to the details: https://dbfiddle.uk/Ebw30T90

    select *
    from
    (
        SELECT Latest.hostname, sd.category, sd.active, latest.metric, v.value
        FROM (
            SELECT hostname,metric, MAX(lastupdated) lastupdated
            FROM server_stat
            GROUP BY hostname,metric
        ) as Latest
        INNER JOIN server_stat v ON Latest.hostname = v.hostname AND Latest.metric = v.metric AND Latest.lastupdated = v.lastupdated
        INNER JOIN server_details sd on Latest.hostname = sd.hostname
    ) x
    pivot
    (
      max(value)
      for metric in([cpu],[ram],[disk])
    ) pvt
    
    hostname category active cpu ram disk
    server1 small yes 10 25 57
    server2 big yes 23 55 33

    The problem with this data, that makes it harder to use PIVOT is that the aggregate value (In this case lastupdated) is not the value that we want to see, instead we want a the value column from the corresponding row with the MAX(lastupdated).

    The key is to prepare the set that we want to PIVOT first, use GROUP BY in place of the TOP 1 / LIMIT 1, and then join back to the server_stat table to get the value

    SELECT Latest.hostname, sd.category, sd.active, latest.metric, v.value
    FROM (
        SELECT hostname,metric, MAX(lastupdated) lastupdated
        FROM server_stat
        GROUP BY hostname,metric
    ) as Latest
    INNER JOIN server_stat v ON Latest.hostname = v.hostname AND Latest.metric = v.metric AND Latest.lastupdated = v.lastupdated
    INNER JOIN server_details sd on Latest.hostname = sd.hostname
    

    I have also included the category field from the server_details so that we don't have to re-query that and re-write out all the field names

    hostname category active metric value
    server2 big yes ram 55
    server2 big yes disk 33
    server2 big yes cpu 23
    server1 small yes ram 25
    server1 small yes disk 57
    server1 small yes cpu 10

    Update: Thanks to Joel Coehoorn!

    This is a thing of beauty ;) We can combine this approach with this solution by Joel Coehoorn to eliminate the GROUP BY that joins back on itself to access the correlated data. I can't really comment on performance of window queries vs group by, but there is a lot less code to manage this way.

    select *
    from
    (
        -- Thanks Joel Coehoorn!
        SELECT distinct sd.hostname, sd.category, sd.active, metric
           , first_value(value) over (partition by sd.hostname, ss.metric 
                                  order by lastupdated desc) value
        FROM Server_Details sd
        INNER JOIN Server_Stat ss on ss.hostname = sd.hostname
    ) x
    pivot
    (
      max(value)
      for metric in([cpu],[ram],[disk])
    ) pvt
    

    Updated Fiddle: (look at the last query) https://dbfiddle.uk/7D9lGpZI