Search code examples
rubyregexalternation

Alternation gives unexpected result


In Ruby, try to extract some patterns from a long string and put each matched pattern into an array of string. For example, the long string input can be

"\"/ebooks/1234.pdf\"  \"/magazines/4321.djvu\""

The expected result is

["/ebooks/1234.pdf", "/magazines/4321.djvu"]

That is a forward slash, followed by one of the three keywords: ebooks, magazines, or newspapers, followed by another forward slash, followed by an arbitrary number of non-whitespace characters except the double quote mark.

Tried this pattern using alternation (the pipe vertical bar), but failed:

/\/(ebooks|magazines)\/[^\s"]+/

Which gives this result:

[["ebooks"], ["magazines"]]

What should be the correct pattern?


Solution

  • "\"/ebooks/1234.pdf\"  \"/magazines/4321.djvu\""
    .scan(/\/(?:ebooks|magazines|newspapers)\/[^\s"]+/)
    # => ["/ebooks/1234.pdf", "/magazines/4321.djvu"]
    
    "\"/ebooks/1234.pdf\"  \"/magazines/4321.djvu\""
    .scan(/"([^"]+)"/).flatten
    # => ["/ebooks/1234.pdf", "/magazines/4321.djvu"]