Search code examples
rubystringmethodsstartswith

Get the same results from string.start_with? and string[ ]


Basically, I want to check if a string (main) starts with another string (sub), using both of the above methods. For example, following is my code:

main = gets.chomp
sub = gets.chomp

p main.start_with? sub
p main[/^#{sub}/]

And, here is an example with I/O - Try it online!


If I enter simple strings, then both of them works exactly the same, but when I enter strings like "1\2" in stdin, then I get errors in the Regexp variant, as seen in TIO example.

I guess this is because of the reason that the string passed into second one isn't raw. So, I tried passing sub.dump into second one - Try it online!

which gives me nil result. How to do this correctly?


Solution

  • As a general rule, you should never ever blindly execute inputs from untrusted sources.

    Interpolating untrusted input into a Regexp is not quite as bad as interpolating it into, say, Kernel#eval, because the worst thing an attacker can do with a Regexp is to construct an Evil Regex to conduct a Regular expression Denial of Service (ReDoS) attack (see also the section on Performance in the Regexp documentation), whereas with eval, they could execute arbitrary code, including but not limited to, deleting the entire file system, scanning memory for unencrypted passwords / credit card information / PII and exfiltrate that via the network, etc.

    However, it is still a bad idea. For example, when I say "the worst thing that happen is a ReDoS", that assumes that there are no bugs in the Regexp implementation (Onigmo in the case of YARV, Joni in the case of JRuby and TruffleRuby, etc.) Ruby's Regexps are quite powerful and thus Onigmo, Joni and co. are large and complex pieces of code, and may very well have their own security holes that could be used by a specially crafted Regexp.

    You should properly sanitize and escape the user input before constructing the Regexp. Thankfully, the Ruby core library already contains a method which does exactly that: Regexp::escape. So, you could do something like this:

    p main[/^#{Regexp.escape(sub)}/]
    

    The reason why your attempt at using String#dump didn't work, is that String#dump is for representing a String the same way you would have to write it as a String literal, i.e. it is escaping String metacharacters, not Regexp metacharacters and it is including the quote characters around the String that you need to have it recognized as a String literal. You can easily see that when you simply try it out:

    sub.dump
    #=> "\"1\\\\2\""
    # equivalent to '"1\\2"'
    

    So, that means that String#dump

    • includes the quotes (which you don't want),
    • escapes characters that don't need escaping in Regexp just because they need escaping in Strings (e.g. # or "), and
    • doesn't escape characters that don't need escaping in Strings (e.g. [, ., ?, *, +, ^, -).