Search code examples
snowflake-cloud-data-platformlimitoffsetarray-agg

Can I batch Snowflake json output into separate rows based on LIMIT/OFFSET


I have a query in Snowflake that produces json output in rows. 29,000+ rows. I am using this query to create records in an MDM system. However, this ingestion process is a bit inefficient and costly. I have since taken my final SELECT statement that created the json and placed in a CTE. Then I made a new final SELECT statement

SELECT ARRAY_AGG(*) as clin_prov
FROM clin_prov

This overwhelmed the system in its size. Using LIMIT in the CTE of 1000, and even 5000, produced results, making a large json record, as expected, and desired. I'd like to have 30 rows of arrays with <= 1000 json records in the final output. Is there a way to do that? Can I create Arrays based on LIMIT and OFFSET and feed into separate rows? I

I tried with and without LIMIT in the clin_prov cte. I tried to use separate ctes based on LIMIT and OFFSET then UNION ALL. That overloaded as well.


Solution

  • Rows can be grouped into "buckets" of predefined size. Example:

    WITH cte AS ( 
        SELECT CEIL(ROW_NUMBER() OVER(ORDER BY NULL) / 30) AS grp, * 
        FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF10.ORDERS 
    ) 
    SELECT grp, ARRAY_AGG(OBJECT_CONSTRUCT(*)) 
    FROM cte 
    GROUP BY grp;