I have the following table:
COMPANY_NAME | GROUP | COUNTRY | STATUS
COM1 | 1 | DE | DELETED
COM2 | 1 | DE | REMAINING
COM3 | 1 | UK | DELETED
COM4 | 2 | ES | DELETED
COM5 | 2 | FR | DELETED
COM6 | 3 | RO | DELETED
COM7 | 3 | BG | DELETED
COM8 | 3 | ES | REMAINING
COM9 | 3 | ES | DELETED
I need to get:
COMPANY_NAME | GROUP | COUNTRY | STATUS
COM3 | 1 | UK | DELETED
COM4 | 2 | ES | DELETED
COM5 | 2 | FR | DELETED
COM6 | 3 | RO | DELETED
COM7 | 3 | BG | DELETED
So, I need all entries where the status is DELETED and within each GROUP there is no COMPANY_NAME that has a status of REMAINING for the same country as the DELETED status. I can use a PROC SQL or a DATA step.
What I have tried so far is:
PROC SQL;
CREATE TABLE WORK.OUTPUT AS
SELECT *
FROM WORK.INPUT
WHERE STATUS = 'DELETED' AND COUNTRY NOT IN (SELECT COUNTRY FROM WORK.INPUT WHERE STATUS = 'REMAINING');
QUIT;
but this obviously excludes all REMAINING countries from all GROUPs.
I also tried a data step:
DATA WORK.OUTPUT;
SET WORK.INPUT;
BY GROUP COUNTRY;
IF NOT (STATUS = 'DELETED' AND COUNTRY NOT IN (COUNTRY WHERE STATUS = 'REMAINING')) THEN DELETE;
RUN;
but the syntax is incorrect because I don't know the proper way of writing this.
Try this out:
proc sql;
select * from your_table
where status = 'deleted' and
catx("_",country,group) not in
(select catx("_",country,group) from your_table where status='remaining');
quit;
Output:
company_name | group | country | status
com3 | 1 | UK | deleted
com4 | 2 | ES | deleted
com5 | 2 | FR | deleted
com6 | 3 | RO | deleted
com7 | 3 | BG | deleted