Search code examples
sqlrandomsnowflake-cloud-data-platformuniform-distribution

Snowflake returning 0 to N records when I expect exactly 1 record


Something that I'm not expecting is going on here. I have a table applications with 22 records with IDs 1 to 22.

I'd like to randomly select 1 record for sample data. But rather than selecting each row with equal probability, I'd like a more interesting (realistic??) distribution. I settled on the product of 2 uniform random variables. The choice of 25 below was to keep the tail from being too tiny, but I think that's not important to the issue.

My query isn't overly complicated. I added the CTE as a convenience so that anyone can run this as a self-contained query. But in my actual query I get this behavior with no CTE.

with applications as (
    select 1 + seq4() as ID, randstr(5, random()) as NAME
    from table(generator(rowcount => 22)) v
)
SELECT
    NAME
    , ID
FROM
    applications a
WHERE
    a.ID = LEAST(
        (
            SELECT 1 + round( 25 * uniform(0::float, 1::float, random()) * uniform(0::float, 1::float, random()) )
        )
        , 22
    )

It returns rows with a frequency that seems great for my purposes... but there's a catch. I do not understand why this query returns a variable number of records.

Returning 1 record appears to be the most common result. But returning 0 or 2 records happens regularly. Sometimes 3. I haven't seen 4 records... but maybe it's possible.

In the example below you can see that I got 2 records. How is it possible that ID is equal to both 4 and 7?

enter image description here

I'm quite interested in both:

  • Why do I get 0 to N records with this query?
  • What variation would produce exactly 1 record in all cases?

Solution

  • The short answer is that I don't know why this is happening - but if you look at the query profile you may be able to see what's going on.

    However, a solution seems to be to put the calculation of the filter value in a CTE rather than in the main SQL statement. The following only gives me one record each time I've run it:

    with filter_value (fil_val) as (
        SELECT 1 + 
        round( 25 * uniform(0::float, 1::float, random()) * 
        uniform(0::float, 1::float, random()) )
    )
    SELECT
        NAME
        , ID
    FROM
        applications a
    WHERE A.ID = LEAST((SELECT FIL_VAL FROM FILTER_VALUE), 22)
    ;