Search code examples
sqlpervasive-sql

Select all rows but only the latest (highest) version from database


I want to retrieve all rows from the DB where the OWNERKEY is 1 but only the highest DATAVERSION of the DATACONTROLID.

In the example below I have two rows where DATACONTROLID= 1, they have 1 and 2 as DATAVERSION. I want to get the highest.

MyDB:

DATAKEY       OWNERKEY       OWNERTYPE      DATAVERSION   MALLKEY       DATAVALUE    DATAVALUETYPE   DATACONTROLID   DATADATE     DATATIME      DATASIGN
===========   ============   ===========   ===========   ===========   =========    ============    =============   ==========   ===========   =========     
4              1             2             1             1             1             2                1             2015-11-24   09:55:00:00   ADMIN
3              1             2             2             1             2             2                1             2015-11-23   20:55:00:00   ADMIN
2              1             2             1             1             3             2                2             2015-11-23   15:39:00:00   ADMIN
1              1             2             1             1             4             2                3             2015-11-23   11:29:00:00   ADMIN

Wanted result:

DATAKEY       OWNERKEY       OWNERTYPE      DATAVERSION   MALLKEY       DATAVALUE    DATAVALUETYPE   DATACONTROLID   DATADATE     DATATIME      DATASIGN
===========   ============   ===========   ===========   ===========   =========    ============    =============   ==========   ===========   =========     
3              1             2             2             1             2             2                1             2015-11-23   20:55:00:00   ADMIN
2              1             2             1             1             3             2                2             2015-11-23   15:39:00:00   ADMIN
1              1             2             1             1             4             2                3             2015-11-23   11:29:00:00   ADMIN

Where do I start?

SELECT *
FROM MyDB
WHERE OWNERKEY = 1

Statement above is the obviuos part, but how do I proceed from that?

I think I should use MAX(DATAVERSION) somehow, but what to group on? And can I use both * and MAX?


Solution

  • Here's a simple example using GROUP BY in MSSQL:

    SELECT  DATACONTROLID, 
            MAX(DATAVERSION) AS DATAVERSION, -- Note that 'AS' just gives the column a name in the result set, this isn't required.
            OWNERKEY
    FROM    MyDB
    WHERE   OWNERKEY = 1
    GROUP BY DATACONTROLID, OWNERKEY
    

    This is assuming that the DATACONTROLID determines 'distinct' records and DATAVERSION is the instance of the record.

    This will group the DATACONTROLID column and return the MAX(DATAVERSION) for the grouped result. In this case, DATACONTROLID = 1 has DATAVERSION = 1 and 2, so this returns 2.

    It's worth noting that if you want to see additional columns in your result and they're not aggregates such as MAX(), then you'll need to add them to the GROUP BY clause - as I have done here with the OWNERKEY column. If you try to SELECT OWNERKEY without having it in the GROUP BY, you'll receive an error.

    EDIT:

    Here's a better way to accomplish the same result by using a JOIN:

    SELECT  *
    FROM    MyDB mdb
    INNER JOIN (
        SELECT  MAX(DATAVERSION) AS DATAVERSION,
                DATACONTROLID
        FROM    MyDB
        WHERE   OWNERKEY = 1
        GROUP BY DATACONTROLID
    ) AS mdb2 
        ON  mdb2.DATACONTROLID = mdb.DATACONTROLID 
            AND mdb2.DATAVERSION = mdb.DATAVERSION
    

    What this does is turns the same GROUP BY statement that I showed you before into a filtering table. The INNER JOIN portion is selecting the MAX(DATAVERSION) and the DATACONTROLID it belongs to, and returning the result as the temp table mdb2.

    This new inner table will return the following:

    DATAVERSION DATACONTROLID
    2           1
    1           2
    1           3
    

    Then we're taking that result and getting all of the rows (SELECT *) that match this criteria. Since this inner table doesn't contain a result for DATAVERSION = 1 where DATACONTROLID = 1, this row gets filtered out.

    Conversely, if you only wanted to see the older versions and not the newest, you can change this criteria to LEFT OUTER JOIN and add WHERE mdb2.DATAVERSION IS NULL.

    SELECT  *
    FROM    MyDB mdb
    LEFT OUTER JOIN (
        SELECT  MAX(DATAVERSION) AS DATAVERSION,
                DATACONTROLID
        FROM    MyDB
        WHERE   OWNERKEY = 1
        GROUP BY DATACONTROLID
    ) AS mdb2 
        ON  mdb2.DATACONTROLID = mdb.DATACONTROLID 
            AND mdb2.DATAVERSION = mdb.DATAVERSION
    WHERE mdb2.DATAVERSION IS NULL
    

    Since we're selecting the MAX(DATAVERSION) in mdb2, it will contain null for the rows that don't meet this criteria.