spaCy matcher unable to identitfy the pattern besides the first

Unable to find where did my pattern go wrong to cause the outcome.

The Sentence I want to find:"#1 – January 31, 2015" and any date that follows this format.

The pattern pattern1=[{'ORTH':'#'},{'is_digital':True},{'is_space':True},{'ORTH':'-'},{'is_space':True},{'is_alpha':True},{'is_space':True},{'is_digital':True},{'is_punct':True},{'is_space':True},{'is_digital':True}]

The print code:print("Matches1:", [doc[start:end].text for match_id, start, end in matches1])

The result: ['#', '#', '#']

Expected result: ['#1 – January 31, 2015','#5 – March 15, 2017','#177 – Novenmber 22, 2019']

Solution

Spacy's matcher operates over tokens, single spaces in the sentence do not yield tokens. Also there are different characters which resemble hyphens : dashes, minus signs etc.. one has to be careful about that. The following code works:

import spacy
nlp = spacy.load('en_core_web_lg')
from spacy.matcher import Matcher
pattern1=[{'ORTH':'#'},{'IS_DIGIT':True},{'ORTH':'–'},{'is_alpha':True},{'IS_DIGIT':True},{'is_punct':True},{'IS_DIGIT':True}]

doc = nlp("#1 – January 31, 2015")

matcher = Matcher(nlp.vocab)
matcher.add("p1", None, pattern1)

matches1 = matcher(doc)
print(" Matches1:", [doc[start:end].text for match_id, start, end in matches1])
# Matches1: ['#1 – January 31, 2015']