Search code examples
sqloracle-databasegreatest-n-per-group

How can I select rows in an SQL table by using multiple substrings of a single column value?


I am using Oracle SQL Developer to query a database of cases, and I need to use three different identifiers to select the correct row for each case; however, my problem is that two of the identifiers are contained in the same text string, and I can’t figure out how to parse them to use in the query.

In the following table, the identifiers I need to use are:

  1. ID-1, characters 3–6 in the "Case" column (e.g., "1001"),
  2. ID-2, characters 8–9 in the "Case" column (e.g., "01") when they appear (treated as "00" if they don't appear), and
  3. ID-3, the value in the "Sequence" column (e.g., "672").

Source Table

Case Sequence Value 1 Value 2
AA1001 672 73 195
AA1001 711 73 185
AA1001-01 680 73 185
AA1001-02 685 72 185
AA1001-02 699 72 182
AB1002 676 51 36
AB1002-01 701 48 39
AB1002-01 719 48 35
AB1002-02 707 51 38
AA1003 655 122 416
AA1003 683 113 416

I want to return one row for each unique ID-1, such that first the greatest value for ID-2 is selected, and then the greatest value for ID-3 in that subset is chosen; so, the query should return only the following three rows from the table above.

Result Table

Case Sequence Value 1 Value 2
AA1001-02 699 72 182
AB1002-02 707 51 38
AA1003 683 113 416

I've tried taking the maximum for ID-2 using the following, but it only returns the rows where ID-2 equals "02".

SELECT * 
FROM table 
WHERE SUBSTR(Case,3,4) in ('1001','1002','1003') 
and SUBSTR(Case,8,2) = (SELECT MAX(SUBSTR(Case,8,2)) 
FROM table 
WHERE SUBSTR(Case,3,4) in ('1001','1002','1003'))

(The easiest answer is probably just to add a column for ID-2; however, the source database is strictly read-only, so I can't make that sort of change.)


Solution

  • In Oracle 12.1 and higher, you can use the match_recognize clause:

    select case, sequence, value1, value2
    from   (
             select t.*, substr(case, 3, 4) as id_1, substr(case, 8, 2) as id_2
             from   table_name t
           )
    match_recognize(
      partition by id_1
      order     by id_2 desc nulls last, sequence desc nulls last
      all rows per match
      pattern   (^ x)
      define    x as null is null
    );
    
    
    CASE        SEQUENCE     VALUE1     VALUE2
    --------- ---------- ---------- ----------
    AA1001-02        699         72        182
    AB1002-02        707         51         38
    AA1003           683        113        416