Good Morning! I am looking for a trick to maintain a unique list of keys where a one-to-many relationship can occur.
Problem
I am working with a horribly non-normalized database at work and unfortunately redesign is out of the question. I have a 1NF master table which contains a number of transitive and partial-key dependencies similar to this:
Cmpd_Key Group Subgroup Group_Desc
A1 | A | 1 | Same
A2 | A | 2 | Same
B1 | B | 1 | Same1
B2 | B | 2 | Same1
C1 | C | 1 | Diff1
C2 | C | 2 | Diff2 <---This field contains multiple values
I often need to pull a unique list of Group
IDs but requirements usually demand the Group_Desc
field as well. Unfortunately, due to poor data entry restrictions up-stream, this description field can contain multiple entries per Group
which causes duplication as the Group
field should be unique in most data pulls. For my purposes I don't really care which Group_Desc
record I pull as long as I can maintain a relationship of 1 Group
to 1 Group_Desc
.
I've come up with an ugly solution that I refer to as an Inline View
whenever I need to reference the Group_Desc
field in a larger query but this kills my performance:
SELECT Group, Group_Desc
FROM Table t
WHERE Subgroup = (SELECT MIN(Subgroup)
FROM Table
WHERE Group = t.Group) --Nasty Correlated Subquery
Question
Does anyone have a performance-friendly trick for pulling back a single row of multiple values repeatedly within the same query? I'd like to be able to pull back Group
and only the first Group_Desc
that appears.
I'm envisioning something like this:
SELECT Group, Group_Desc
FROM Table t
GROUP BY Group, Group_Desc
HAVING ROWNUM = [The lowest returned Rownum within the same Group]
A fellow developer mentioned the RANK
function as a possible solution but I didn't see how I could use that to eliminate values.
Any help you can provide would be greatly appreciated!
----------------EDIT----------------------
So after some additional analysis, I was able to point to an omission in my original correlated subquery which caused an overly long execution plan. By adding a few additional predicates, the Optimizer was able to create a better plan which changed my execution time from around 12 to 2 minutes which is in line with my expectations.
I did experiment quite a bit with the Analytics solution that Ponder Stibbons suggested below. His solution is quite elegant and I have chosen as the answer for this question, however, I was unable to use it in this particular query as execution time was significantly slower than my original solution primarily due to an index I was able to utilize in my correlated subquery.
I have no doubt that in a fair comparison that the Analytics solution would run on par or better than the Correlated SubQuery solution. I appreciate everyone's assistance on this problem!
You can use min
in analytical version here, it is fast:
select
TGroup,
min(Group_Desc) over (partition by tgroup)
from t
first_value
is also the option:
select TGroup,
first_value(Group_Desc) over (partition by tgroup order by subgroup) gd
from t