Basically, I want to check if a string (main
) starts with another string (sub
), using both of the above methods. For example, following is my code:
main = gets.chomp
sub = gets.chomp
p main.start_with? sub
p main[/^#{sub}/]
And, here is an example with I/O - Try it online!
If I enter simple strings, then both of them works exactly the same, but when I enter strings like "1\2"
in stdin
, then I get errors in the Regexp variant, as seen in TIO example.
I guess this is because of the reason that the string passed into second one isn't raw. So, I tried passing sub.dump
into second one - Try it online!
which gives me nil
result. How to do this correctly?
As a general rule, you should never ever blindly execute inputs from untrusted sources.
Interpolating untrusted input into a Regexp
is not quite as bad as interpolating it into, say, Kernel#eval
, because the worst thing an attacker can do with a Regexp
is to construct an Evil Regex to conduct a Regular expression Denial of Service (ReDoS) attack (see also the section on Performance in the Regexp
documentation), whereas with eval
, they could execute arbitrary code, including but not limited to, deleting the entire file system, scanning memory for unencrypted passwords / credit card information / PII and exfiltrate that via the network, etc.
However, it is still a bad idea. For example, when I say "the worst thing that happen is a ReDoS", that assumes that there are no bugs in the Regexp
implementation (Onigmo in the case of YARV, Joni in the case of JRuby and TruffleRuby, etc.) Ruby's Regexp
s are quite powerful and thus Onigmo, Joni and co. are large and complex pieces of code, and may very well have their own security holes that could be used by a specially crafted Regexp
.
You should properly sanitize and escape the user input before constructing the Regexp
. Thankfully, the Ruby core library already contains a method which does exactly that: Regexp::escape
. So, you could do something like this:
p main[/^#{Regexp.escape(sub)}/]
The reason why your attempt at using String#dump
didn't work, is that String#dump
is for representing a String
the same way you would have to write it as a String
literal, i.e. it is escaping String
metacharacters, not Regexp
metacharacters and it is including the quote characters around the String
that you need to have it recognized as a String
literal. You can easily see that when you simply try it out:
sub.dump
#=> "\"1\\\\2\""
# equivalent to '"1\\2"'
So, that means that String#dump
Regexp
just because they need escaping in String
s (e.g. #
or "
), andString
s (e.g. [
, .
, ?
, *
, +
, ^
, -
).