How do you extract all values containing part of a particular number and then delete them? I have data where the ID contains different lengths and wants to extract all the IDs with a particular number. For example, if the ID contains either "-00" or "02" or "-01" at the end, pull to be able to see the hit rate that includes those—then delete them from the ID. Is there a more effecient way in creating this code?
I tried to use the substring function to slice it to get the result, but there is some other ID along with the specified position.
Code:
Proc sql;
Create table work.data1 AS
SELECT Product, Amount_sold, Price_per_unit,
CASE WHEN Product Contains "Pen" and Lenghth(ID) >= 9 Then ID = SUBSTR(ID,1,9)
WHEN Product Contains "Book" and Lenghth(ID) >= 11 Then ID = SUBSTR(ID,1,11)
WHEN Product Contains "Folder" and Lenghth(ID) >= 12 Then ID = SUBSTR(ID,1,12)
...
END AS ID
FROM A
Quit;
Have:
+------------------+-----------------+-------------+----------------+
| ID | Product | Amount_sold | Price_per_unit |
+------------------+-----------------+-------------+----------------+
| 123456789 | Pen | 30 | 2 |
| 63495837229-01 | Book | 20 | 5 |
| ABC134475472 02 | Folder | 29 | 7 |
| AB-1235674467-00 | Pencil | 26 | 1 |
| 69598346-02 | Correction pen | 15 | 1.50 |
| 6970457688 | Highlighter | 15 | 2 |
| 584028467 | Color pencil | 15 | 10 |
+------------------+-----------------+-------------+----------------+
Wanted the final result:
+------------------+-----------------+-------------+----------------+
| ID | Product | Amount_sold | Price_per_unit |
+------------------+-----------------+-------------+----------------+
| 123456789 | Pen | 30 | 2 |
| 63495837229 | Book | 20 | 5 |
| ABC134475472 | Folder | 29 | 7 |
| AB-1235674467 | Pencil | 26 | 1 |
| 69598346 | Correction pen | 15 | 1.50 |
| 6970457688 | Highlighter | 15 | 2 |
| 584028467 | Color pencil | 15 | 10 |
+------------------+-----------------+-------------+----------------+
Just test if the string has any embedded spaces or hyphens and also that the last word when delimited by space or hyphen is 00 or 01 or 02 then chop off the last three characters.
data have;
infile cards dsd dlm='|' truncover ;
input id :$20. product :$20. amount_sold price_per_unit;
cards;
123456789 | Pen | 30 | 2 |
63495837229-01 | Book | 20 | 5 |
ABC134475472 02 | Folder | 29 | 7 |
AB-1235674467-00 | Pencil | 26 | 1 |
69598346-02 | Correction pen | 15 | 1.50 |
6970457688 | Highlighter | 15 | 2 |
584028467 | Color pencil | 15 | 10 |
;
data want;
set have ;
if indexc(trim(id),'- ') and scan(id,-1,'- ') in ('00' '01' '02') then
id = substrn(id,1,length(id)-3)
;
run;
Result
amount_ price_
Obs id product sold per_unit
1 123456789 Pen 30 2.0
2 63495837229 Book 20 5.0
3 ABC134475472 Folder 29 7.0
4 AB-1235674467 Pencil 26 1.0
5 69598346 Correction pen 15 1.5
6 6970457688 Highlighter 15 2.0
7 584028467 Color pencil 15 10.0