Search code examples
sqlfilterdatatablemin

Including variables after filtering selecting only minimum values in SQL?


I am working with a twin dataset and would like to create a table with the Subject ID (Subject) and twin pair ID (twpair) for the twins with the lower (or one of the twins if the values are equal) lifetime total of marijuana use (MJ1a).

A portion of my table looks like this:

Subject twpair MJ1a
156 345 10
157 345 7
158 346 20
159 346 3
160 347 4
161 347 4

I'm hoping to create a table with only the twins that have the lower amount of marijuana use which would look like this:

Subject twpair MJ1a
157 345 7
159 346 3
161 347 4

This is the SQL code I have so far:

proc sql; 
create table one_twin as 
select twpair,min(MJ1a) as minUse, Subject 
from twins_deviation
group by twpair; 

Unfortunately this ends up causing all of the subjects to be remerged back in the dataset. If I don't include the Subject portion I get the correct values for twpair and MJ1a but not the Subject IDs.

How do I filter the dataset to only include those with the minimum values while also including variables of interest like Subject ID? Note that if two pairs of twins have the SAME value I would like to select one but it doesn't matter which I select. Any tips would be extremely appreciated!


Solution

  • This query should give you the desired result.

    select a.subject,a.twpair,a.MJ1a from twins_deviation a join (select twpair,min(mj1a) as mj1a from twins_deviation group by twpair)b on a.twpair=b.twpair and a.mj1a=b.mj1a

    If your DB supports analytic/window functions ,the same can be accomplished using a rank function ,solution given below.

    EDIT1:to handle same values for mj1a

    select subject,twpair,mj1a from(select subject,twpair,mj1a ,row_number() over(partition by twpair order by mj1a) as rnk from twins_deviation)out1 where rnk=1;

    EDIT2:Updated solution 1 to include only one twin.

    select min(subject) as subject,twpair,mj1a from(select a.subject as subject ,a.twpair as twpair,a.MJ1a as MJ1a from twins_deviation a join (select twpair,min(mj1a) as mj1a from twins_deviation group by twpair)b on a.twpair=b.twpair and a.mj1a=b.mj1a)out1 group by twpair,MJ1a;