Search code examples
ruby-on-railspostgresqlactiverecordpg

Postgres command to remove backslashes works in console, but not with ActiveRecord::Base.connection.execute


This Postgres command removes all backslashes when run from the psql command line:

UPDATE table SET column = REGEXP_REPLACE(column, '\B', '', 'g');

This Rails command doesn't work (it runs without error, but doesn't remove the backslash):

ActiveRecord::Base.connection.execute("UPDATE table SET column = REGEXP_REPLACE(column, '\B', '', 'g'))

It looks like Rails changes the query and runs this (I inspected the output from running the query in the console):

UPDATE table SET column = REGEXP_REPLACE(column, 'B', '', 'g')

The Postgres command also successfully removes the backslash when run from the psql command line:

UPDATE table SET column = REGEXP_REPLACE(column, '\\', '', 'g')

This same Postgres command surprisingly errors out when run with Rails:

ActiveRecord::Base.connection.execute("UPDATE table SET column = REGEXP_REPLACE(column, '\\', '', 'g')")

Here's the error:

PG::InvalidRegularExpression: ERROR: invalid regular expression: invalid escape \ sequence

Here's the Postgres version that's set in my Gemfile:

gem 'pg', '>= 0.18', '< 2.0'

Can you please explain why I'm getting this error and how to write Rails code that will remove all backslashes from a column? This will be run on a large data set and I'll need a fast solution.


Solution

  • What you're trying to do is complicated by having multiple things that want to interpret \ in their own way:

    1. \ means something inside a double quoted string in Ruby. So "\B" is just a misleading and complicated way of writing "B" in Ruby. You'd need to say "\\B" to get a string containing the two characters \B.

    2. \ means something inside a PostgreSQL regex. Hence the presence of \B in PostgreSQL's regexes so that you don't have to count \s when looking at things like \\\\ inside another language, you'd just have slightly less ugly things like \\B.

    To get around your problem you could just properly escape your escapes:

    ActiveRecord::Base.connection.execute("UPDATE table SET column = REGEXP_REPLACE(column, '\\B', '', 'g')")
    

    When the database sees that, it will see:

    UPDATE table SET column = REGEXP_REPLACE(column, '\B', '', 'g')
    

    because \\ means "give me just one \" in a double quoted Ruby string.

    I'd probably simplify further and use the SQL replace function:

    replace(string text, from text, to text)
    Replace all occurrences in string of substring from with substring to.

    combined with a %q{...} string in Ruby:

    ActiveRecord::Base.connection.execute(%q{
      update table
      set column = replace(column, '\', '')
    })
    

    The %q{...} gets rid of one escaping problem because it acts like a single-quoted string in Ruby so \ doesn't mean anything special. Using replace in the SQL gets rid of another escaping problem because it only works with plain old strings (rather than regexes represented by strings) so \ again doesn't mean anything.