Search code examples
mysqlsqlgreatest-n-per-groupquery-performance

How to improve performance getting recent records to display in list, recent top 5 most


I'm making a sample recent screen that will display a list, it displays the list, with id set as primary key.

I have done the correct query as expected but the table with big amount of data can cause slow performance issues.

This is the sample query below:

SELECT  distinct H.id    -- (Primary Key),
        H.partnerid as PartnerId,
        H.partnername AS partner, H.accountname AS accountName,
        H.accountid as AccountNo,
    FROM  myschema.mytransactionstable H
    INNER JOIN (
        SELECT  S.accountid, S.partnerid, S.accountname,
                max(S.transdate) AS maxDate
            from  myschema.mytransactionstable S
            group by  S.accountid, S.partnerid, S.accountname
               ) ms  ON H.accountid = ms.accountid
      AND  H.partnerid = ms.partnerid
      AND  H.accountname =ms.accountname
      AND  H.transdate = maxDate
    WHERE  H.accountid = ms.accountid
      AND  H.partnerid = ms.partnerid
      AND  H.accountname = ms.accountname
      AND  H.transdate = maxDate
    GROUP BY  H.partnerid,H.accountid, H.accountname
    ORDER BY  H.id DESC
    LIMIT  5

In my case, there are values which are similar in the selected columns but differ only in their id's

Below is a link to an image without executing the query above. They are all the records that have not yet been filtered. Sample result query click here

Since I only want to get the 5 most recent by their id but the other columns can contain similar values accountname,accountid,partnerid.

I already got the correct query but, I want to improve the performance of the query. Any suggestions for the improvement of query?


Solution

    • Don't repeat ON and WHERE clauses. Use ON to say how the tables (or subqueries) are "related"; use WHERE for filtering (that is, which rows to keep). Probably in your case, all the WHERE should be removed.

    • Please provide SHOW CREATE TABLE

    • This 'composite' index would probably help because of dealing with the subquery and the JOIN:

        INDEX(partnerid, accountid, accountname, transdate)
      
    • That would also avoid a separate sort for the GROUP BY.

    • But then the ORDER BY is different, so it cannot avoid a sort.

    • This might avoid the sort without changing the result set ordering: ORDER BY partnerid, accountid, accountname, transdate DESC

    • Please provide EXPLAIN SELECT ... and EXPLAIN FORMAT=JSON SELECT ... if you have further questions.

    If we cannot get an index to handle the WHERE, GROUP BY, and ORDER BY, the query will generate all the rows before seeing the LIMIT 5. If the index does work, then the outer query will stop after 5 -- potentially a big savings.