I have a dataset of call logs made to a group of subscribers called multiple times, with each call attempt on a new row. I need to analyze some data based on when the first contact was made to the subscriber, but I'm having trouble pulling the minimum date of contact. I've tried a few things, but the only successful output I created simply output the create date (which is the variable I need to get the minimum of for each subscriber's set of calls) that was already linked to the line, not the smallest create date in the dataset for that subscriber.
My ideal outcome would look like this (where min_date/min_month are the variables I'm trying to create):
Subscriber ID | Create Date | Min_date | Min_month
123456 | 01Jan2020 | 01Jan2020 | Jan2020
123456 | 05Mar2020 | 01Jan2020 | Jan2020
I've also had some issues with the output format. When I did get output, I get dates formatted in some sort of sequential numeric code? For example, I will get something like '22095' instead of any sort of actual date.
The code I'm using to pull a minimum date:
create table min_dates as
select
sub_id,
min2.min_date2
from (select 'Subscriber ID'n as sub_id, min('Create Date'n) as min_date2 from work.min) as min2
inner join work.min as min1 on min1.'Subscriber ID'n = min2.sub_id;
quit;
I think this might be working but I cannot really tell due to the formatting issues. When I re-join this subtable back to the main data pull at the end of my code all of the values are also missing. I need two more things out of this code if it is correct:
But if it is wrong and is simply giving me the date on the same row, I need to fix that too.
Any help is appreciated.
If you want to use SQL you will have to tell SAS what format to use for displaying your calculated variables. To aggregate by groups us the GROUP BY clause.
select Subscriber_Id, min(Create_Date) as min_date2 format=date9.
from have
group by Subscriber_Id
;
If you use PROC SUMMARY instead then the calculated variable will have the same format attached as the source variable. You can group by using CLASS statement or if the data is sorted using a BY statement.
proc summary nway data=have ;
class Subscriber_Id;
var Create_Date ;
output out=want min=min_date2 ;
run;
To get your exact table with multiple observations per group you can use PROC SQL's ability to automatically remerge summary statistics with detail observations.
select Subscriber_Id
, Create_Date
, min(Create_Date) as min_date format=date9.
, min(Create_Date) as min_month format=monyy7.
from have
group by Subscriber_Id
;