I'm trying to select the top 1 record belonging to each group in proc sql, but not been able to identify a correct way to do it in proc sql. I'm trying to find top account tied to the store with transactions and total_sales ordered in descending manner. In simple words, one account may be tied to multiple stores, so within each account, I'm trying to filter the top record with highest transactions and sales.
I have an excel file which I'm importing using proc import, but providing a sample table below as an example.
| account | store_num | transactions | sales |
|---------|-----------|--------------|-------|
| 101 | 1000 | 3 | 90 |
| 101 | 1000 | 6 | 170 |
| 105 | 1015 | 5 | 156 |
| 102 | 1234 | 9 | 200 |
| 101 | 1234 | 7 | 189 |
| 102 | 1234 | 11 | 267 |
| 105 | 1030 | 5 | 170 |
I'm initially grouping the account and store_num by sum of transactions and sales in descending order and then used PROC RANK to assign rank.
Below is the code that I have tried so far.
proc import datafile="/myfolder/eg.xlsx"
out=fileg dbms=xlsx replace;
run;
proc sql;
create table tempp as
Select account, store_num, sum(transactions) as total_Transactions, sum(sales) as Total_Sales,
from fileg
group by account, store_num
order by total_transactions desc, Total_Sales desc;
quit;
proc sql;
create table main2 as
select * from tempp
order by account;
quit;
proc rank data=main2 out=result ties=dense descending;
by account;
var total_transactions;
ranks transRank;
run;
proc print data=result n;
run;
The output of 'result' looks like the one below:
| account | store_num | transactions | sales | transRank |
|---------|-----------|--------------|-------|-----------|
| 102 | 1234 | 20 | 467 | 2 |
| 101 | 1000 | 9 | 260 | 1 |
| 101 | 1234 | 7 | 189 | 1 |
| 105 | 1030 | 5 | 170 | 1 |
| 105 | 1015 | 5 | 156 | 1 |
I want my final table to look like the one below so that it ranks and filter outs the first record for each account having highest transactions. In case of a tie in the rank due to same transactions, the highest sales will be used to provide the rank.
| account | store_num | transactions | sales |
|---------|-----------|--------------|-------|
| 102 | 1234 | 20 | 467 |
| 101 | 1000 | 9 | 260 |
| 105 | 1030 | 5 | 170 |
Based on your current code, you could do the following, use FIRST to get the top record after sorting the tempp file.
proc sort data=tempp;
by account descending transactions descending sales;
run;
data want;
set tempp;
by account;
if first.account;
run;