Search code examples
sqloracledategreatest-n-per-group

SQL: Select row with max value and group by a single column


First time asking a question on stackoverflow so apologies for any mistakes.

I'm trying to transform an old table into a new format.

Old table / Input example:

| id                               | collectionId                     | infoText   |
|----------------------------------|----------------------------------|------------|
| 20200227112631476162094432822589 | 20200227112630931296846572143651 | Step 0     |
| 20200227112631512664092998338570 | 20200227112630931296846572143651 | Step 1     |
| 20200227112631652576662844108316 | 20200227112630931296846572143651 | Successful |

New table / Output:

| collectionId                     | startTimestamp                  | stopTimeStamp                   | lastStatus |
|----------------------------------|---------------------------------|---------------------------------|-------------|
| 20200227112630931296846572143651 | 27-FEB-20 11.26.30.931000000 AM | 27-FEB-20 11.26.50.911000000 AM | Successful  |

Essentially the following is required:

  • Create a row from the information of the latest row from a collection:
    • The row with the greatest id and the same collection id.
  • Transform the first 17 characters from the collection id into a start-timestamp
    (ex: 2020022711263093 -> 27-FEB-20 11.26.30.931000000 AM).
  • Transform the first 17 characters from the latest id from that collection into a stop-timestamp. (ex: 2020022711263165 -> 27-FEB-20 11.26.50.911000000 AM).

I've been trying to do this all day and i feel like i'm close to solving it. However it seems that everything i try results in a different error.

My latest attempt:

CREATE table newTable AS

SELECT
    a.collectionId                                                                AS collectionId,
    a.id                                                                          AS id,
    to_timestamp(substr(a.collectionId , 0, 17), 'YYYYMMDDHH24MISSFF')            AS starttimestamp,
    "STOPTIMESTAMP"                                                               AS stoptimestamp,
    a.infoText                                                                    AS lastStatus,
FROM
    oldTable a
    INNER JOIN (
        SELECT
            MAX(id),
            to_timestamp(substr(MAX(id), 0, 17), 'YYYYMMDDHH24MISSFF')            AS stoptimestamp,
            collectionId                                                          AS collectionId
        FROM
            oldTable
        GROUP BY
            collectionId
    ) b ON a.collectionId = b.collectionId
           AND stoptimestamp = b.stoptimestamp;

This however results in a table with duplicate collection-id's.

I would really appreciate the help as i'm not that experienced with SQL. The example shown here is modified to be more simple, the table that i'm working with has more (extra text) fields and contains over 2M+ rows. It is an Oracle XE 18c DB if that helps.

Thanks for the help!


Solution

  • You can use window functions to identity the last record per group, and then do the date conversions:

    select
        collection_id,
        to_timestamp(substr(collection_id, 1, 17), 'yyyymmddhh24missff') start_timestamp,
        to_timestamp(substr(id,            1, 17), 'yyyymmddhh24missff') end_timestamp,
        info_text last_status
    from (
        select 
            t.*,
            row_number() over(partition by collection_id order by id desc) rn
        from mytable t
    ) t
    where rn = 1
    

    Demo on DB Fiddle:

                       COLLECTION_ID | START_TIMESTAMP              | END_TIMESTAMP                | LAST_STATUS
    -------------------------------: | :--------------------------- | :--------------------------- | :----------
    20200227112630931296846572143651 | 27-FEB-20 11.26.30.931000000 | 27-FEB-20 11.26.31.652000000 | Successful