This Postgres command removes all backslashes when run from the psql
command line:
UPDATE table SET column = REGEXP_REPLACE(column, '\B', '', 'g');
This Rails command doesn't work (it runs without error, but doesn't remove the backslash):
ActiveRecord::Base.connection.execute("UPDATE table SET column = REGEXP_REPLACE(column, '\B', '', 'g'))
It looks like Rails changes the query and runs this (I inspected the output from running the query in the console):
UPDATE table SET column = REGEXP_REPLACE(column, 'B', '', 'g')
The Postgres command also successfully removes the backslash when run from the psql
command line:
UPDATE table SET column = REGEXP_REPLACE(column, '\\', '', 'g')
This same Postgres command surprisingly errors out when run with Rails:
ActiveRecord::Base.connection.execute("UPDATE table SET column = REGEXP_REPLACE(column, '\\', '', 'g')")
Here's the error:
PG::InvalidRegularExpression: ERROR: invalid regular expression: invalid escape \ sequence
Here's the Postgres version that's set in my Gemfile
:
gem 'pg', '>= 0.18', '< 2.0'
Can you please explain why I'm getting this error and how to write Rails code that will remove all backslashes from a column? This will be run on a large data set and I'll need a fast solution.
What you're trying to do is complicated by having multiple things that want to interpret \
in their own way:
\
means something inside a double quoted string in Ruby. So "\B"
is just a misleading and complicated way of writing "B"
in Ruby. You'd need to say "\\B"
to get a string containing the two characters \B
.
\
means something inside a PostgreSQL regex. Hence the presence of \B
in PostgreSQL's regexes so that you don't have to count \
s when looking at things like \\\\
inside another language, you'd just have slightly less ugly things like \\B
.
To get around your problem you could just properly escape your escapes:
ActiveRecord::Base.connection.execute("UPDATE table SET column = REGEXP_REPLACE(column, '\\B', '', 'g')")
When the database sees that, it will see:
UPDATE table SET column = REGEXP_REPLACE(column, '\B', '', 'g')
because \\
means "give me just one \
" in a double quoted Ruby string.
I'd probably simplify further and use the SQL replace
function:
replace(string text, from text, to text)
Replace all occurrences in string of substring from with substring to.
combined with a %q{...}
string in Ruby:
ActiveRecord::Base.connection.execute(%q{
update table
set column = replace(column, '\', '')
})
The %q{...}
gets rid of one escaping problem because it acts like a single-quoted string in Ruby so \
doesn't mean anything special. Using replace
in the SQL gets rid of another escaping problem because it only works with plain old strings (rather than regexes represented by strings) so \
again doesn't mean anything.