I'm using BigQuery and trying to use declared variables in my stored procedure. In the snippet below, I'm able to use _error_rate
inside my CTEs, but I am not able to use _total_records
at the bottom to limit the size of the temporary table.
The error I get from BigQuery is:
Syntax error: Unexpected identifier "_total_records" at [19:16]
CREATE OR REPLACE PROCEDURE my_dataset.my_procedure()
BEGIN
DECLARE _error_rate DECIMAL DEFAULT 0.05;
DECLARE _total_records INT64 DEFAULT 100;
CREATE TEMPORARY TABLE temp_final_rows_1 AS
(WITH cte_final_rows AS
(SELECT CASE
WHEN RAND() < _error_rate
THEN FLOOR(RAND() * 1000000000)
WHEN RAND() < _error_rate THEN NULL
ELSE `NPI` END AS `NPI`,
-- etc.
FROM my_dataset.v_base_rows)
SELECT *
FROM cte_final_rows
ORDER BY RAND()
LIMIT _total_records); -- Syntax error: Unexpected identifier "_total_records"
-- etc.
END;
I tried replacing _total_records
with a literal value 100
and it worked. so I know it's okay with _error_rate
in the CTE because the procedure works as expected.
I can't find anywhere in the BigQuery documentation that explains why _total_records
would be out of scope here. I expect either both variables to be in scope, or both out of scope. I'm very confused why one is accessible and the other is not. Is this a BigQuery bug or is this a quirk of all SQL dialects? I'm having trouble googling for solutions or explanations. Any help is appreciated.
Previously, I had the CREATE TEMPORARY TABLE
statement inside an EXECUTE IMMEDIATE """ ... """
string and had no trouble injecting _total_records
by using string concatenation. I really don't want to do that though because it's very messy and my IDE doesn't offer any help with queries as big strings...
Query parameters can be used as substitutes for arbitrary expressions. Parameters cannot be used as substitutes for identifiers, column names, table names, or other parts of the query.
It is not clear what the "other parts of the query" are. Apparently LIMIT
is in that group. We found a workaround that uses ROW_NUMBER()
and a WHERE
instead of LIMIT
.
CREATE OR REPLACE PROCEDURE my_dataset.my_procedure()
BEGIN
DECLARE _error_rate DECIMAL DEFAULT 0.05;
DECLARE _total_records INT64 DEFAULT 100;
CREATE TEMPORARY TABLE temp_final_rows_1 AS
(WITH cte_final_rows AS
(SELECT ROW_NUMBER() OVER(ORDER BY RAND()) as row_num,
CASE
WHEN RAND() < _error_rate
THEN FLOOR(RAND() * 1000000000)
WHEN RAND() < _error_rate THEN NULL
ELSE `NPI` END AS `NPI`,
-- etc.
FROM my_dataset.v_base_rows)
SELECT * EXCEPT (row_num)
FROM cte_final_rows
WHERE row_num <= _total_records);
-- etc.
END;