I'm using Ruby 2.4. I want to match a bunch of non-letter and numbers, followed by one or more numbers, followed by an arbitrary amount of non-letters and numbers. However, this string
2.4.0 :001 > token = "17 Milton,GA"
=> "17 Milton,GA"
...
2.4.0 :004 > Regexp.new("\\A([[:space:]]|[^\p{L}^0-9])*\\d+[^\p{L}^0-9]*\\z").match?(token.downcase)
=> true
is matching my regular expression and I dont' want it to since there are letters that follow the number. What do I need to adjust in my regexp so that the only thing I can match after the numbers will be non-letters and non-numbers?
There are a couple of issues with the regex.
1) When you are using a double quoted string literal in a Regexp.new
constructor, to declare a literal backslash you need to double it (\p
=> \\p
)
2) [^\p{L}^0-9]
is is a wrong construct for any char but a letter and digit because the second ^
is treated as a literal ^
symbol. You need to remove the second ^
at least. You may also use [^[:alnum:]]
to match any non-alphanumeric symbol.
3) The pattern above matches whitespaces, too, so you do not need to alternate it with [[:space]]
. ([[:space:]]|[^\p{L}^0-9])*
-> [^\p{L}0-9]*
.
So, you may use your fixed Regexp.new("\\A[^\\p{L}0-9]*\\d+[^\\p{L}0-9]*\\z")
regexp, or use
/\A[^[:alnum:]]*\d+[^[:alnum:]]*\z/.match?(token.downcase)
See the Rubular demo where your sample string is not matched with the regex.
Details:
\A
- start of a string[^[:alnum:]]*
- 0+ non-alphanumeric chars\d+
- 1+ digits[^[:alnum:]]*
- 0+ non-alphanumeric chars\z
- end of string.