Search code examples
pythonspacymatcher

Complex(repeating) rule using Spacy Pattern Matcher


I want to match a repeating pattern using spaCy's pattern matcher. Following is the pattern that i want to match: My account number is: 2893-26492-634-0924-63. Some more text here. Basically, trying to match the following regex: \d+(-\d+)*

matcher = Matcher(nlp.vocab)
matcher.add('NUMBER_MERGE', None, [ {'IS_DIGIT': True}, {'IS_PUNCT': True}, {'IS_DIGIT': True}, {'IS_SPACE':True}])

This matches 342-234 Text, however fails for 342-234-958 Text.

I did not find any documentation to apply quantifiers on a set of operators. Any help would be appreciated.


Solution

  • You can directly use the regex as a pattern.

    matcher.add('NUMBER_MERGE', None, [{"TEXT": {"REGEX": "\d+(-\d+)*"}}])