I have a dataset that has a client id, and a uniform random variable for each id that looks something like this:
ClientId | RandUnif |
---|---|
1234 | 0.38223456 |
5678 | 0.7456789 |
*client ids are unique and serve as the primary key for this table
I then have another data set that has a “number of outcomes field” and a cumulative probability column that looks like this:
NumOutcomes | CumulativeProb |
---|---|
1 | 0.543678 |
2 | 0.756789 |
My goal is to compare the uniform random variable from my client id table to the cumulative probability column in the second table in order to create a new column in the client table.
So, for instance the first row of the client table would have a value in the new column of “1” because it is less than our equal to the first cumulative probability. The second row would have a “2” in the new column because it’s greater than the first cumulative probability but less than the second, and so on.
I have tried using a do loop, and a sql query to no avail
You can do a cross-join where RandomUnif <= CumulativeProb
. This would get you a table like this:
ClientID NumOutcomes RandUnif CumulativeProb
1234 1 0.38223456 0.543678
1234 2 0.38223456 0.756789
5678 2 0.7456789 0.756789
From there, you'll take the minimum value of CumulativeProb
by each ClientID since the minimum value is less than or equal to the first cumulative probability. You can do this with a sql having
clause.
proc sql;
create table want as
select ClientID
, NumOutcomes
, RandUnif
, CumulativeProb
from have, lookup
where RandUnif LE CumulativeProb
group by ClientID
having CumulativeProb = min(CumulativeProb)
;
quit;
ClientID NumOutcomes RandUnif CumulativeProb
1234 1 0.38223456 0.543678
5678 2 0.7456789 0.756789