I am working with a twin dataset and would like to create a table with the Subject ID (Subject) and twin pair ID (twpair) for the twins with the lower (or one of the twins if the values are equal) lifetime total of marijuana use (MJ1a).
A portion of my table looks like this:
Subject | twpair | MJ1a |
---|---|---|
156 | 345 | 10 |
157 | 345 | 7 |
158 | 346 | 20 |
159 | 346 | 3 |
160 | 347 | 4 |
161 | 347 | 4 |
I'm hoping to create a table with only the twins that have the lower amount of marijuana use which would look like this:
Subject | twpair | MJ1a |
---|---|---|
157 | 345 | 7 |
159 | 346 | 3 |
161 | 347 | 4 |
This is the SQL code I have so far:
proc sql;
create table one_twin as
select twpair,min(MJ1a) as minUse, Subject
from twins_deviation
group by twpair;
Unfortunately this ends up causing all of the subjects to be remerged back in the dataset. If I don't include the Subject portion I get the correct values for twpair and MJ1a but not the Subject IDs.
How do I filter the dataset to only include those with the minimum values while also including variables of interest like Subject ID? Note that if two pairs of twins have the SAME value I would like to select one but it doesn't matter which I select. Any tips would be extremely appreciated!
This query should give you the desired result.
select a.subject,a.twpair,a.MJ1a from twins_deviation a join (select twpair,min(mj1a) as mj1a from twins_deviation group by twpair)b on a.twpair=b.twpair and a.mj1a=b.mj1a
If your DB supports analytic/window functions ,the same can be accomplished using a rank function ,solution given below.
EDIT1:to handle same values for mj1a
select subject,twpair,mj1a from(select subject,twpair,mj1a ,row_number() over(partition by twpair order by mj1a) as rnk from twins_deviation)out1 where rnk=1;
EDIT2:Updated solution 1 to include only one twin.
select min(subject) as subject,twpair,mj1a from(select a.subject as subject ,a.twpair as twpair,a.MJ1a as MJ1a from twins_deviation a join (select twpair,min(mj1a) as mj1a from twins_deviation group by twpair)b on a.twpair=b.twpair and a.mj1a=b.mj1a)out1 group by twpair,MJ1a;