Search code examples
sqlsnowflake-cloud-data-platformhtmlcleaner

how to remove all html characters in snowflake, dont want to include all html special characters in query (no hardcoding)


Want to remove below kind of characters from string..pl help ' &


Solution

  • You may try this one to remove any HTML special characters:

    select REGEXP_REPLACE( 'abc&def³»ghi', '&[^&]+;', '!'  );
    

    Explanation:

    REGEXP_REPLACE uses regular expression to search and replace. I search for "&[^&]+;" and replace it with "!" for demonstration. You can of course use '' to remove them. More info about the function:

    https://docs.snowflake.com/en/sql-reference/functions/regexp_replace.html

    About the regular expression string:

    • & is the & character of a HTML special character
    • [^&] means any character except &. Tthis prevents to REGEXP to replace all characters between the first '&' char and last ';'. It will stop when it see second '&'
    • + means match 1 or more of preceding token (any character except &)
    • ; is the last character of a HTML special character