Search code examples
rubyregexstringheredocstring-interpolation

Ruby string interpolate variables but dont alter regex


I need to return a string that contains a regex and interpolates instance variables. The string needs to look like this:

"SELECT field1, field2 CASE WHEN REGEXP_CONTAINS(field3, r"^\".*\"$") THEN 'this' WHEN REGEXP_CONTAINS(field3, r"^\[.*]$") THEN 'that' WHEN field3 = '(not provided)' THEN NULL ELSE 'the_other' END AS better_field_3, field4 FROM `interpolated_table_name1` AS tbl LEFT JOIN `interpolated_table_name2` AS tbl2 ON blah = blah"

I used this code to generate it:

def string_query
  statement = 
    <<-HEREDOC
      SELECT
        field1,
        field2
        CASE WHEN REGEXP_CONTAINS(field3, r"^\".*\"$") THEN 'this'
          WHEN REGEXP_CONTAINS(field3, r"^\[.*]$") THEN 'that' 
          WHEN field3 = '(not provided)' THEN NULL
          ELSE 'the_other' END AS better_field_3,
        field4
      FROM `#{@dynamic_table_name1}` AS tbl
      LEFT JOIN `#{@dynamic_table_name2}` AS tbl2
        ON blah = blah
    HEREDOC
  statement.squish!
end

The string is housed in double quotes, which is why the regex is escaped. When I run this SQL upon the database to perform the query, the regex has been altered, and the extra backslashes to escape are not removed.


Solution

  • Heredocs act like double quoted strings as far as backslash-escapes are concerned so you have to manually escape your backslashes by doubling them:

    statement = 
      <<-HEREDOC
        SELECT
          field1,
          field2
          CASE WHEN REGEXP_CONTAINS(field3, r"^\\".*\\"$") THEN 'this'
            WHEN REGEXP_CONTAINS(field3, r"^\\[.*]$") THEN 'that' 
            WHEN field3 = '(not provided)' THEN NULL
            ELSE 'the_other' END AS better_field_3,
          field4
        FROM `#{@dynamic_table_name1}` AS tbl
        LEFT JOIN `#{@dynamic_table_name2}` AS tbl2
          ON blah = blah
      HEREDOC
    

    You could do away with the statement variable and call squish (or squish!) directly on the heredoc as well:

    def string_query
      <<-HEREDOC.squish
        SELECT
          field1,
          field2
          CASE WHEN REGEXP_CONTAINS(field3, r"^\\".*\\"$") THEN 'this'
            WHEN REGEXP_CONTAINS(field3, r"^\\[.*]$") THEN 'that' 
            WHEN field3 = '(not provided)' THEN NULL
            ELSE 'the_other' END AS better_field_3,
          field4
        FROM `#{@dynamic_table_name1}` AS tbl
        LEFT JOIN `#{@dynamic_table_name2}` AS tbl2
          ON blah = blah
      HEREDOC
    end
    

    BTW, I'm assuming that @dynamic_table_name1 and @dynamic_table_name2 are known to be safe so that you don't have to worry about interpolating those into your string without escaping.


    The double quotes in this:

    r"^\".*\"$"
    

    have nothing to do with how Ruby treats the ^\".*\"$. Double quotes inside a heredoc are just meaningless characters, they're nothing special. The heredoc itself provides the "double quoted string" context that is causing your backslashes to be treated specially.