Search code examples
rubyregexalphanumericnon-alphanumeric

How do I match non-letters and non-numbers after a bunch of numbers?


I'm using Ruby 2.4. I want to match a bunch of non-letter and numbers, followed by one or more numbers, followed by an arbitrary amount of non-letters and numbers. However, this string

2.4.0 :001 > token = "17 Milton,GA"
 => "17 Milton,GA"
...
2.4.0 :004 > Regexp.new("\\A([[:space:]]|[^\p{L}^0-9])*\\d+[^\p{L}^0-9]*\\z").match?(token.downcase)
 => true

is matching my regular expression and I dont' want it to since there are letters that follow the number. What do I need to adjust in my regexp so that the only thing I can match after the numbers will be non-letters and non-numbers?


Solution

  • There are a couple of issues with the regex.

    1) When you are using a double quoted string literal in a Regexp.new constructor, to declare a literal backslash you need to double it (\p => \\p)

    2) [^\p{L}^0-9] is is a wrong construct for any char but a letter and digit because the second ^ is treated as a literal ^ symbol. You need to remove the second ^ at least. You may also use [^[:alnum:]] to match any non-alphanumeric symbol.

    3) The pattern above matches whitespaces, too, so you do not need to alternate it with [[:space]]. ([[:space:]]|[^\p{L}^0-9])* -> [^\p{L}0-9]*.

    So, you may use your fixed Regexp.new("\\A[^\\p{L}0-9]*\\d+[^\\p{L}0-9]*\\z") regexp, or use

    /\A[^[:alnum:]]*\d+[^[:alnum:]]*\z/.match?(token.downcase)
    

    See the Rubular demo where your sample string is not matched with the regex.

    Details:

    • \A - start of a string
    • [^[:alnum:]]* - 0+ non-alphanumeric chars
    • \d+ - 1+ digits
    • [^[:alnum:]]* - 0+ non-alphanumeric chars
    • \z - end of string.