Search code examples
datesasdate-formattingminimumproc-sql

Finding minimum/oldest create date for each subscriber in two different formats SAS


I have a dataset of call logs made to a group of subscribers called multiple times, with each call attempt on a new row. I need to analyze some data based on when the first contact was made to the subscriber, but I'm having trouble pulling the minimum date of contact. I've tried a few things, but the only successful output I created simply output the create date (which is the variable I need to get the minimum of for each subscriber's set of calls) that was already linked to the line, not the smallest create date in the dataset for that subscriber.

My ideal outcome would look like this (where min_date/min_month are the variables I'm trying to create):

Subscriber ID | Create Date | Min_date | Min_month

123456 | 01Jan2020 | 01Jan2020 | Jan2020
123456 | 05Mar2020 | 01Jan2020 | Jan2020

I've also had some issues with the output format. When I did get output, I get dates formatted in some sort of sequential numeric code? For example, I will get something like '22095' instead of any sort of actual date.

The code I'm using to pull a minimum date:


create table  min_dates as 

select 
sub_id,
min2.min_date2


from (select 'Subscriber ID'n as sub_id, min('Create Date'n) as  min_date2 from work.min) as min2

inner join work.min as min1 on min1.'Subscriber ID'n = min2.sub_id;



quit;

I think this might be working but I cannot really tell due to the formatting issues. When I re-join this subtable back to the main data pull at the end of my code all of the values are also missing. I need two more things out of this code if it is correct:

  • The same minimum date broken into a new column with just the month and year
  • The min_date formatted correctly into a readable date (I don't have a need for any particular date format as long as it is actually a date)

But if it is wrong and is simply giving me the date on the same row, I need to fix that too.

Any help is appreciated.


Solution

  • If you want to use SQL you will have to tell SAS what format to use for displaying your calculated variables. To aggregate by groups us the GROUP BY clause.

    select Subscriber_Id, min(Create_Date) as min_date2 format=date9.
      from have
      group by Subscriber_Id
    ;
    

    If you use PROC SUMMARY instead then the calculated variable will have the same format attached as the source variable. You can group by using CLASS statement or if the data is sorted using a BY statement.

    proc summary nway data=have ;
      class Subscriber_Id;
      var Create_Date ; 
      output out=want min=min_date2 ;
    run;
    

    To get your exact table with multiple observations per group you can use PROC SQL's ability to automatically remerge summary statistics with detail observations.

    select Subscriber_Id
         , Create_Date
         , min(Create_Date) as min_date format=date9.
         , min(Create_Date) as min_month format=monyy7.
      from have
      group by Subscriber_Id
    ;