Search code examples
sqloracle-databasegroup-byduplicatescandidate-key

Oracle - Arbitrarily pick one of multiple rows based on single unique key


Good Morning! I am looking for a trick to maintain a unique list of keys where a one-to-many relationship can occur.

Problem

I am working with a horribly non-normalized database at work and unfortunately redesign is out of the question. I have a 1NF master table which contains a number of transitive and partial-key dependencies similar to this:

Cmpd_Key  Group  Subgroup   Group_Desc
  A1    |   A   |    1   |    Same
  A2    |   A   |    2   |    Same
  B1    |   B   |    1   |    Same1
  B2    |   B   |    2   |    Same1
  C1    |   C   |    1   |    Diff1
  C2    |   C   |    2   |    Diff2  <---This field contains multiple values

I often need to pull a unique list of Group IDs but requirements usually demand the Group_Desc field as well. Unfortunately, due to poor data entry restrictions up-stream, this description field can contain multiple entries per Group which causes duplication as the Group field should be unique in most data pulls. For my purposes I don't really care which Group_Desc record I pull as long as I can maintain a relationship of 1 Group to 1 Group_Desc.

I've come up with an ugly solution that I refer to as an Inline View whenever I need to reference the Group_Desc field in a larger query but this kills my performance:

SELECT Group, Group_Desc
FROM Table t
WHERE Subgroup = (SELECT MIN(Subgroup)
                  FROM Table
                  WHERE Group = t.Group) --Nasty Correlated Subquery

Question

Does anyone have a performance-friendly trick for pulling back a single row of multiple values repeatedly within the same query? I'd like to be able to pull back Group and only the first Group_Desc that appears.

I'm envisioning something like this:

SELECT Group, Group_Desc
FROM Table t
GROUP BY Group, Group_Desc    
HAVING ROWNUM = [The lowest returned Rownum within the same Group]

A fellow developer mentioned the RANK function as a possible solution but I didn't see how I could use that to eliminate values.

Any help you can provide would be greatly appreciated!

----------------EDIT----------------------

So after some additional analysis, I was able to point to an omission in my original correlated subquery which caused an overly long execution plan. By adding a few additional predicates, the Optimizer was able to create a better plan which changed my execution time from around 12 to 2 minutes which is in line with my expectations.

I did experiment quite a bit with the Analytics solution that Ponder Stibbons suggested below. His solution is quite elegant and I have chosen as the answer for this question, however, I was unable to use it in this particular query as execution time was significantly slower than my original solution primarily due to an index I was able to utilize in my correlated subquery.

I have no doubt that in a fair comparison that the Analytics solution would run on par or better than the Correlated SubQuery solution. I appreciate everyone's assistance on this problem!


Solution

  • You can use min in analytical version here, it is fast:

    select 
        TGroup, 
        min(Group_Desc) over (partition by tgroup) 
      from t
    

    SQLFiddle demo

    first_value is also the option:

    select TGroup,
        first_value(Group_Desc) over (partition by tgroup order by subgroup) gd
      from t