Search code examples
javascriptsqlstored-proceduressnowflake-cloud-data-platformdynamic-programming

What are other ways of formatting the SQL in a JavaScript stored procedure in Snowflake?


I created a stored procedure in Snowflake that lets me identify duplicate rows of data in a given table and stop the subsequent code from running.

The parameters for this stored procedure are the Database, Schema, Table, and one or more Natural keys from the respective table since finding duplicate rows in one table may require more than 1 natural key in the ROW_NUMBER() function.

CREATE OR REPLACE PROCEDURE FIND_DUPLICATE_ROWS(DB_NAME STRING, SCHEMA_NAME STRING, TABLE_NAME STRING, NATURAL_KEYS STRING)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS
$$
    // Split the Natural keys into an array and trim whitespaces
    var keysArray = NATURAL_KEYS.split(',').map(function(item) { return item.trim(); });

    // Format the Natural keys into comma separated string for the query
    var NaturalkeyString = keysArray.join(', ');

    // Assemble the SQL query to identify duplicate rows
    var sql_query = 
        'WITH duplicates AS (' +
            'SELECT ' + NaturalkeyString +
            ', ROW_NUMBER() OVER (PARTITION BY ' + NaturalkeyString + ' ORDER BY ' + NaturalkeyString + ') AS RowNumber ' +
            'FROM "' + DB_NAME + '"."' + SCHEMA_NAME + '"."' + TABLE_NAME + '"' +
        ') ' +
        'SELECT * FROM duplicates WHERE RowNumber > 1';

    var stmt = snowflake.createStatement({sqlText: sql_query});

    var result = stmt.execute();

    // Check if there are duplicate rows
    if(result.next()){
        throw 'ERROR: Duplicate rows found in the table ' + TABLE_NAME;
    }
    else{
        return 'No duplicate rows found in the table ' + TABLE_NAME;
    }
$$;

-- Test the Stored Procedure:
CALL FIND_DUPLICATE_ROWS('SUPA_DB', 'RAW', 'DIM_EMP_DETAILS', 'ID,EMP_NAME,ORG');

--If the "FIND_DUPLICATE_ROWS" stored procedure works the following code should NOT execute after calling "FIND_DUPLICATE_ROWS" stored procedure:
SELECT 7887;

My issue is that I don't like how I have to concatenate the SQL to make it work, is there a cleaner way to do this?


Solution

  • I found my mistake, instead of backticks ( ` ) I was using single quotes ( ' ) it was just hard for me to see. Then by using JavaScript Template Literals with the syntax:

    `String Text ${Expression} string text`
    

    I was able to format my stored procedure appropriately as shown below:

    CREATE OR REPLACE PROCEDURE FIND_DUPLICATE_ROWS(
    DB_NAME STRING, 
    SCHEMA_NAME STRING, 
    TABLE_NAME STRING, 
    NATURAL_KEYS STRING
    )
    RETURNS STRING
    LANGUAGE JAVASCRIPT
    EXECUTE AS CALLER
    AS
    $$
        // Split the Natural keys into an array and trim whitespaces
        var keysArray = NATURAL_KEYS.split(',').map(function(item) { return item.trim(); });
    
        // Format the Natural keys into comma separated string for the query
        var NaturalKeyString = keysArray.join(', ');
    
        // Assemble the SQL query to identify duplicate rows
        var sql_query = `
            SELECT 
                ${NaturalKeyString},
                ROW_NUMBER() OVER(PARTITION BY ${NaturalKeyString} ORDER BY ${NaturalKeyString}) AS RowNumber
            FROM ${DB_NAME}.${SCHEMA_NAME}.${TABLE_NAME}
            QUALIFY RowNumber > 1;`;
    
        var stmt = snowflake.createStatement({sqlText: sql_query});
    
        var result = stmt.execute();
    
        error_message = `"ERROR: Duplicate rows found in ${DB_NAME}.${SCHEMA_NAME}.${TABLE_NAME}"`;
    
        // Check if there are duplicate rows
        if(result.next()){
            throw error_message;
        }
        else{
            return 0;
        }
    $$;
    
    -- SUPA_DB.RAW.DIM_EMP_DETAILS
    CALL FIND_DUPLICATE_ROWS('SUPA_DB', 'RAW', 'DIM_EMP_DETAILS', 'ID,EMP_NAME,ORG');
    
    --If the "FIND_DUPLICATE_ROWS" stored procedure works the following code should NOT execute after calling "FIND_DUPLICATE_ROWS" stored procedure:
    SELECT 7887;