Search code examples
sqlregexsnowflake-cloud-data-platform

SQL RegEx to confirm a format


I am trying to confirm value of a field to match a certain format.

enter image description here

Here is work in progress, which is passing for most of values but fails when value_5 is missing (which is optional). Also is there an elegant way to confirm it?

This SQL is passing but I believe it is not correct:

SELECT REGEXP_LIKE('3-Abc-Def_ValueNumber2_ValueNumber3_ValueNumber4_ValueNumber5','[0-9]{1,2}-[A-Za-z0-9]{3}-[A-Za-z0-9]{3}_[A-Za-z0-9]{3,17}_[A-Za-z0-9]{3,17}_[A-Za-z0-9]{3,20}_[A-Za-z0-9]{3,20}.*');

This one failing, when optional parameter is missing:

SELECT REGEXP_LIKE('3-Abc-Def_ValueNumber2_ValueNumber3_ValueNumber4_ValueNumber5_ValueNumber6','[0-9]{1,2}-[A-Za-z0-9]{3}-[A-Za-z0-9]{3}_[A-Za-z0-9]{3,17}_[A-Za-z0-9]{3,17}_[A-Za-z0-9]{3,20}_[A-Za-z0-9]{3,20}_[A-Za-z0-9]{3,20}.*');

Following are some other sample values to confirm the solution as well:

'3-ABC-DEF_GHI_ValueNumber2_ValueNumber3_ValueNumber4__ValueNumber6'
'23-ABC-DEF_GHI_ValueNumber2_Value3_ValueNumber4'
'23-AB-DEF_GHIJK_ValueNumber2_ValueNumber3_ValueNumber4_ValueNumber5'

Solution

  • Used split() to break down value to it's elements and then verified format. Still love to have a regex solution (if anyone comes up with it)

    CREATE OR REPLACE TEMP TABLE temp_validate_pattern (pixel_value VARCHAR);
    INSERT INTO temp_validate_pattern
    VALUES 
    ('03-ABC-DEF_GHI_ValueNumber3_ValueNumber4__ValueNumber6'),
    ('23-ABC-DEF_GHI_Value3_ValueNumber4'),
    ('23-ABC-DEF_GHI_ValueNumber3_ValueNumber4_ValueNumber5');
    
    WITH cte_data AS (
        SELECT 
            pixel_value
          , split(pixel_value,'_') AS pixel_element
        FROM temp_validate_pattern
        GROUP BY ALL 
    )
    , cte_split_data AS (
        SELECT 
            t.pixel_value
          , t.pixel_element[0]::varchar AS value_1
          , t.pixel_element[1]::varchar AS value_2 
          , t.pixel_element[2]::varchar AS value_3
          , t.pixel_element[3]::varchar AS value_4
          , t.pixel_element[4]::varchar AS value_5
          , t.pixel_element[5]::varchar AS value_6
        FROM cte_data t
        GROUP BY ALL
    )
    , cte_flag_data AS (
        SELECT  
            pixel_value
          , value_1
          , REGEXP_LIKE(value_1,'[0-9]{2}-[A-Za-z0-9]{3}-[A-Za-z0-9]{3,7}') AS value_1_flag
          , value_2
          , REGEXP_LIKE(value_2,'[A-Za-z0-9]{3}') AS value_2_flag
          , value_3
          , REGEXP_LIKE(value_3,'^[A-Za-z0-9]{0,30}$') AS value_3_flag 
          , value_4
          , CASE WHEN value_4 IS NULL THEN TRUE
                 ELSE REGEXP_LIKE(value_4,'^[A-Za-z0-9]{0,30}$')                     
            END AS value_4_flag
          , value_5
          , CASE WHEN value_5 IS NULL THEN TRUE
                 ELSE REGEXP_LIKE(value_5,'^[A-Za-z0-9]{0,30}$')                     
            END AS value_5_flag
        FROM cte_split_data
    )
    -- SELECT * FROM cte_flag_data;
    SELECT 
        pixel_value
      , (value_1_flag AND value_1_flag AND value_1_flag AND value_1_flag AND value_1_flag) AS valid_campaign
    FROM cte_flag_data
    ORDER BY 1;