Search code examples
databasesasdata-management

Advice on how best to manage this dataset?


New to SAS and would appreciate advice and help on how best to handle this data mangement situation.

I have a dataset in which each observation represents a client. Each client has a "description" variable which could include either a comprehensive assessment, treatment or discharge. I have created 3 new variables to flag each observation if they contain one of these.

So for example:

treat_yes = 1 if description contains "tx", "treatment" dc_yes = 1 if description contains "dc", "d/c" or "discharge" ca_yes = 1 if desciption contains "comprehensive assessment" or "ca" or "comprehensive ax"

My end goal is to have a new dataset of clients that have gone through a Comprehensive Assessment, Treatment and Discharge.

I'm a little stumped as to what my next move should be here. I have all my variables flagged for clients. But there could be duplicate observations just because a client could have come in many times. So for example:

Client_id    treatment_yes    ca_yes   dc_yes
1234               0            1        1
1234               1            0        0
1234               1            0        1

All I really care about is if for a particular client the variables treatment_yes, ca_yes and dc_yes DO NOT equal 0 (i.e., they each have at least one "1". They could have more than one "1" but as long as they are flagged at least once).

I was thinking my next step might be to collapse the data (how do you do this?) for each unique client ID and sum treatment_yes, dc_yes and ca_yes for each client.

Does that work?

If so, how the heck do I accomplish this? Where do I start?

thanks everyone!


Solution

  • I think the easiest thing to do at this point is to use a proc sql step to find the max value of each of your three variables, aggregated by client_id:

    data temp;
        input Client_id $ treatment_yes ca_yes dc_yes;
        datalines;
        1234 0 1 1
        1234 1 0 0
        1234 1 0 1
        ;
    run;
    
    proc sql;
        create table temp_collapse as select distinct
            client_id, max(treatment_yes) as treatment_yes,
            max(ca_yes) as ca_yes, max(dc_yes) as dc_yes
            from temp
            group by client_id;
    quit;
    

    A better overall approach would be to use the dataset you used to create the _yes variables and do something like max(case when desc = "tx" then 1 else 0 end) as treatment_yes etc., but since you're still new to SAS and understand what you've done so far, I think the above approach is totally sufficient.