Search code examples
sasprobability

Need to create a column in a dataset based on the value in a different dataset


I have a dataset that has a client id, and a uniform random variable for each id that looks something like this:

ClientId RandUnif
1234 0.38223456
5678 0.7456789

*client ids are unique and serve as the primary key for this table

I then have another data set that has a “number of outcomes field” and a cumulative probability column that looks like this:

NumOutcomes CumulativeProb
1 0.543678
2 0.756789

My goal is to compare the uniform random variable from my client id table to the cumulative probability column in the second table in order to create a new column in the client table.

So, for instance the first row of the client table would have a value in the new column of “1” because it is less than our equal to the first cumulative probability. The second row would have a “2” in the new column because it’s greater than the first cumulative probability but less than the second, and so on.

I have tried using a do loop, and a sql query to no avail


Solution

  • You can do a cross-join where RandomUnif <= CumulativeProb. This would get you a table like this:

    ClientID    NumOutcomes RandUnif    CumulativeProb
    1234        1           0.38223456  0.543678
    1234        2           0.38223456  0.756789
    5678        2           0.7456789   0.756789
    

    From there, you'll take the minimum value of CumulativeProb by each ClientID since the minimum value is less than or equal to the first cumulative probability. You can do this with a sql having clause.

    proc sql;
        create table want as
            select ClientID
                 , NumOutcomes
                 , RandUnif
                 , CumulativeProb
            from have, lookup
            where RandUnif LE CumulativeProb
            group by ClientID
            having CumulativeProb = min(CumulativeProb)
        ;
    quit;
    
    ClientID    NumOutcomes RandUnif    CumulativeProb
    1234        1           0.38223456  0.543678
    5678        2           0.7456789   0.756789