I have an imported excel file, DATASET looks like:
Family Weight
1 150
1 210
1 99
2 230
2 100
2 172
I need to find the sum of ranks for each family.
I know that I can do this easily using PROC RANK but this is a HW problem and the only PROC statement I can use is PROC Means. I cannot even use Proc Sort.
The ranking would be as follows (lowest weight receives rank = 1, etc)
99 - Rank = 1
100 - Rank = 2
150 - Rank = 3
172 - Rank = 4
210 - Rank = 5
230 - Rank = 6
Resulting Dataset:
Family Sum_Ranking
1 9
2 12
Family 1 Sum_Ranking was calculated by (3+5+1)
Family 2 Sum_Ranking was calculated by (6+2+4)
Thank you for assistance.
I'm not going to give you code, but some tips.
Specifically, the most interesting part about the instructions is the explicit "not even PROC SORT".
PROC MEANS has a useful side effect, in that it sorts data by the class variables (in the class variable order). So,
PROC SORT data=blah out=blah_w;
by x y;
run;
and
PROC MEANS data=blah;
class x y;
var y;
output out=blah_w n=;
run;
Have almost the identical results. Both produce a dataset sorted by x y
, even though PROC MEANS didn't require a sort.
So in this case, you can use PROC MEANS' class statement to produce a dataset that is sorted by weight and family (you must carry over family here even though you don't need it). Then you must use a data step to produce a RANK variable, which is the rank of the current line (use the _FREQ_
column to figure that out in case there are more than one with the same rank in the same family, and think about what to do in case of ties), then another PROC MEANS
to summarize by family
this time.