Search code examples
sas

How to extract into a new variable the last 11 characters of a string variable in SAS only if they are numbers


I have a dataset with a variable with company names. Sometimes, these company names have a person ID as their last 11 characteres.

So I want to create a new variable with their last 11 characteres and keep it only if they are numbers.

Here is an example of my dataset:

DATA have;
  INPUT CompanyName;
  DATALINES;
ASASD FSFASD ASDFASDFASD
ASDFAS ADFASDF FDAFADFADF 01232143211
ADAFSAD ADSFASDF 03331286532
LKJLKJK KLKJKJL
LLKJLKJ FDAFASDF 75423243211
;
RUN;

I want a new variable with missing values for lines 1 and 4 and the last 11 numeric values for lines 2, 3 5 (01232143211, 03331286532, 75423243211).

What have I tried?

PROC SQL;
   CREATE TABLE WANT AS 
   SELECT 
       CompanyName,
       case when prxmatch('/^\d{11}$/', substr(CompanyName, length(CompanyName) - 10)) > 0 
               then substr(CompanyName, length(CompanyName) - 10) 
               else "" end as ID_OWNER,
      FROM have;
QUIT;

But I've got a new variable with all missing values.


Solution

  • Much easier to do with SAS code instead of SQL code.

    First makes sure your data step to create the example data actually works.

    data have;
      input CompanyName $50. ;
    datalines;
    ASASD FSFASD ASDFASDFASD
    ASDFAS ADFASDF FDAFADFADF 01232143211
    ADAFSAD ADSFASDF 03331286532
    LKJLKJK KLKJKJL
    LLKJLKJ FDAFASDF 75423243211
    ;
    

    Then if the string is long enough to have 11 characters then take the last 11 characters and make sure that they are all digits.

    data want;
      set have;
      length id $11 ;
      if length(companyname) > 11 then do;
        id = substr(companyname,length(companyname)-10);
        if verify(id,'0123456789') then id=' ';
      end;
    run;
    

    Result

    Obs    CompanyName                                  id
    
     1     ASASD FSFASD ASDFASDFASD
     2     ASDFAS ADFASDF FDAFADFADF 01232143211    01232143211
     3     ADAFSAD ADSFASDF 03331286532             03331286532
     4     LKJLKJK KLKJKJL
     5     LLKJLKJ FDAFASDF 75423243211             75423243211
    

    You might want to also remove the id value when it exists. So add this ELSE statement after that last IF statement.

    else companyname = substr(companyname,1,length(companyname)-11);
    

    Results

    Obs    CompanyName                      id
    
     1     ASASD FSFASD ASDFASDFASD
     2     ASDFAS ADFASDF FDAFADFADF    01232143211
     3     ADAFSAD ADSFASDF             03331286532
     4     LKJLKJK KLKJKJL
     5     LLKJLKJ FDAFASDF             75423243211