I have a dataset with a variable with company names. Sometimes, these company names have a person ID as their last 11 characteres.
So I want to create a new variable with their last 11 characteres and keep it only if they are numbers.
Here is an example of my dataset:
DATA have;
INPUT CompanyName;
DATALINES;
ASASD FSFASD ASDFASDFASD
ASDFAS ADFASDF FDAFADFADF 01232143211
ADAFSAD ADSFASDF 03331286532
LKJLKJK KLKJKJL
LLKJLKJ FDAFASDF 75423243211
;
RUN;
I want a new variable with missing values for lines 1 and 4 and the last 11 numeric values for lines 2, 3 5 (01232143211
, 03331286532
, 75423243211
).
What have I tried?
PROC SQL;
CREATE TABLE WANT AS
SELECT
CompanyName,
case when prxmatch('/^\d{11}$/', substr(CompanyName, length(CompanyName) - 10)) > 0
then substr(CompanyName, length(CompanyName) - 10)
else "" end as ID_OWNER,
FROM have;
QUIT;
But I've got a new variable with all missing values.
Much easier to do with SAS code instead of SQL code.
First makes sure your data step to create the example data actually works.
data have;
input CompanyName $50. ;
datalines;
ASASD FSFASD ASDFASDFASD
ASDFAS ADFASDF FDAFADFADF 01232143211
ADAFSAD ADSFASDF 03331286532
LKJLKJK KLKJKJL
LLKJLKJ FDAFASDF 75423243211
;
Then if the string is long enough to have 11 characters then take the last 11 characters and make sure that they are all digits.
data want;
set have;
length id $11 ;
if length(companyname) > 11 then do;
id = substr(companyname,length(companyname)-10);
if verify(id,'0123456789') then id=' ';
end;
run;
Result
Obs CompanyName id
1 ASASD FSFASD ASDFASDFASD
2 ASDFAS ADFASDF FDAFADFADF 01232143211 01232143211
3 ADAFSAD ADSFASDF 03331286532 03331286532
4 LKJLKJK KLKJKJL
5 LLKJLKJ FDAFASDF 75423243211 75423243211
You might want to also remove the id value when it exists. So add this ELSE statement after that last IF statement.
else companyname = substr(companyname,1,length(companyname)-11);
Results
Obs CompanyName id
1 ASASD FSFASD ASDFASDFASD
2 ASDFAS ADFASDF FDAFADFADF 01232143211
3 ADAFSAD ADSFASDF 03331286532
4 LKJLKJK KLKJKJL
5 LLKJLKJ FDAFASDF 75423243211