Search code examples
sassas-macroproc-sql

How to extracting all values that contain part of particular number and then deleting them?


How do you extract all values containing part of a particular number and then delete them? I have data where the ID contains different lengths and wants to extract all the IDs with a particular number. For example, if the ID contains either "-00" or "02" or "-01" at the end, pull to be able to see the hit rate that includes those—then delete them from the ID. Is there a more effecient way in creating this code?

I tried to use the substring function to slice it to get the result, but there is some other ID along with the specified position.

Code:

Proc sql;
Create table work.data1 AS
SELECT Product, Amount_sold, Price_per_unit,
CASE WHEN Product Contains "Pen" and Lenghth(ID) >= 9 Then ID = SUBSTR(ID,1,9)
WHEN Product Contains "Book" and Lenghth(ID) >= 11 Then ID = SUBSTR(ID,1,11)
WHEN Product Contains "Folder" and Lenghth(ID) >= 12 Then ID = SUBSTR(ID,1,12)
...
END AS ID
FROM A 
Quit;

Have:

+------------------+-----------------+-------------+----------------+
|     ID           |  Product        | Amount_sold | Price_per_unit |
+------------------+-----------------+-------------+----------------+
| 123456789        | Pen             |      30     |        2       |
| 63495837229-01   | Book            |      20     |        5       |
| ABC134475472 02  | Folder          |      29     |        7       |
| AB-1235674467-00 | Pencil          |      26     |        1       |
| 69598346-02      | Correction pen  |      15     |      1.50      |
| 6970457688       | Highlighter     |      15     |        2       |
| 584028467        | Color pencil    |      15     |       10       |
+------------------+-----------------+-------------+----------------+

Wanted the final result:

+------------------+-----------------+-------------+----------------+
|     ID           |  Product        | Amount_sold | Price_per_unit |
+------------------+-----------------+-------------+----------------+
| 123456789        | Pen             |      30     |        2       |
| 63495837229      | Book            |      20     |        5       |
| ABC134475472     | Folder          |      29     |        7       |
| AB-1235674467    | Pencil          |      26     |        1       |
| 69598346         | Correction pen  |      15     |      1.50      |
| 6970457688       | Highlighter     |      15     |        2       |
| 584028467        | Color pencil    |      15     |       10       |
+------------------+-----------------+-------------+----------------+

Solution

  • Just test if the string has any embedded spaces or hyphens and also that the last word when delimited by space or hyphen is 00 or 01 or 02 then chop off the last three characters.

    data have;
      infile cards dsd dlm='|' truncover ;
      input id :$20. product :$20. amount_sold price_per_unit;
    cards;
    123456789        | Pen             |      30     |        2       |
    63495837229-01   | Book            |      20     |        5       |
    ABC134475472 02  | Folder          |      29     |        7       |
    AB-1235674467-00 | Pencil          |      26     |        1       |
    69598346-02      | Correction pen  |      15     |      1.50      |
    6970457688       | Highlighter     |      15     |        2       |
    584028467        | Color pencil    |      15     |       10       |
    ;
    
    data want;
      set have ;
      if indexc(trim(id),'- ') and scan(id,-1,'- ') in ('00' '01' '02') then
        id = substrn(id,1,length(id)-3)
      ;
    run;
    

    Result

                                              amount_     price_
    Obs    id               product             sold     per_unit
    
     1     123456789        Pen                  30         2.0
     2     63495837229      Book                 20         5.0
     3     ABC134475472     Folder               29         7.0
     4     AB-1235674467    Pencil               26         1.0
     5     69598346         Correction pen       15         1.5
     6     6970457688       Highlighter          15         2.0
     7     584028467        Color pencil         15        10.0