Search code examples
mysqlsqloptimizationquery-optimization

Selecting single max values


Say I need to pull data from several tables like so:

item 1 - from table 1
item 2 - from table 1
item 3 - from table 1 - but select only max value of item 3 from table 1
item 4 - from table 2 - but select only max value of item 4 from table 2

My query is pretty simple:

select
    a.item 1,
    a.item 2,
    b.item 3,
    c.item 4
from table 1 a
left join (select b.key_item, max(item 3) from table 1, group by key_item) b on a.key_item = b.key_item
left join (select c.key_item, max(item 4) from table 2, group by key_item) c on c.key_item = a.key_item

I am not sure if my methodology of pulling just a single max item from a table is the most efficient. Assume both tables are over a million rows. my actual sql run forever using this sql setup.

EDIT: I changed the group by clause to reflect comments made. I hope it makes a bit of sense now?


Solution

  • Your best bet is to add an index on table1 and table2, as follows:

    ALTER TABLE table1
    ADD INDEX `GoodIndexName1` (`key_item`,`item3`)
    
    ALTER TABLE table2
    ADD INDEX `GoodIndexName2` (`key_item`,`item4`)
    

    This will allow you to use queries as described in the MySQL documentation for finding the rows holding the group-wise maximum, which appears to be what you are looking for.

    Your original (edited) query should work:

    select
        a.item1,
        a.item2,
        b.item3,
        c.item4
    from table1 a
    LEFT OUTER JOIN (
        SELECT 
        b.key_item, 
        MAX(item3) AS item3
        FROM table1
        GROUP BY key_item
    ) b 
    ON a.key_item = b.key_item
    LEFT OUTER JOIN (
        SELECT 
        c.key_item, 
        MAX(item4) 
        FROM table2
        GROUP BY key_item
    ) c 
    ON c.key_item = a.key_item
    

    and if that performs slowly after adding the indexes, try the following too:

    SELECT
        a.item1,
        a.item2,
        b.item3,
        c.item4
    FROM table1 a
    LEFT OUTER JOIN table1 b
    ON b.key_item = a.key_item
    LEFT OUTER JOIN table1 larger_b
    ON larger_b.key_item = b.key_item
    AND larger_b.item3 > b.item_3
    LEFT OUTER JOIN table2 c
    ON c.key_item = a.key_item
    LEFT OUTER JOIN table2 larger_c
    ON larger_c.key_item = c.key_item
    AND larger_c.item4 > c.item4
    WHERE larger_b.key_item IS NULL
    AND larger_c.key_item IS NULL
    

    (I have modified the table and column names only slightly, so that they conform to correct MySQL syntax. )

    I work with queries that use the above structure all the time, and they perform very efficiently with indexes like the one I provided.

    That said, usually I am using INNER JOINs on the b and c tables, but I don't see why your query should have any issues.

    If you do experience performance problems still, report the data types of the key_item columns for each table, as if you try to join on different data types, you will generally get poor performance.