Search code examples
python-3.xspacypos-tagger

Matching patterns in spaCy returns a empty result


I was hoping to find some patterns with this simple code. But the result is empty. I'm forgetting something?

for tk in doc[:30]:
     print (tk.text, ':', tk.pos_)

Método : NOUN de : ADP avaliaçãoSimulação : NOUN computacional : ADJ conforme : ADP procedimentos : NOUN apresentados : VERB em : ADP : SPACE Edifi : PROPN cações : NOUN em : ADP fase : NOUN de : ADP projetoA : NOUN avaliação : NOUN deve : VERB ser : AUX feita : VERB para : ADP um : NUM dia : NOUN típico : ADJ de : ADP projeto : NOUN de : ADP verão : NOUN e : CCONJ de : ADP

pattern = [
       {'POS': 'NOUN'},
       {'LOWER': 'ADP'},
       ]
    #Matcher class object
matcher = Matcher(nlp.vocab)
matcher.add("matching_1", patterns = [pattern]) 

result = matcher(doc, as_spans=True) 

print(result)

[]

So I was expecting the pattern of the POS Tags 'NOUN' + 'ADP' could find the words: 'Método de', 'cações em', 'fase de', 'projeto de'.


Solution

  • The following rule will match a token that equals "ADP" when made lowercase. This will not match anything because "ADP" is not lowercase.

    {'LOWER': 'ADP'},
    

    I am not sure what this is supposed to match, maybe you want to match a lowercase word with POS = ADP? In that case you would want a rule like this:

    {"POS": "ADP", "REGEX": "^[a-z]+$"}
    

    To restate what I said above: {'LOWER': 'ADP'} does not match a lowercase word with the ADP part of speech. You seem to be confused about what "LOWER" means or how rules work.

    Let me give an example. {"LOWER": "dog"} will match words like "Dog", "DOG", or "dog". It will not match words with the part of speech "dog" (which do not exist). "LOWER": value means, "match words which look like value when they are made lowercase".

    If you want to match lower case words that have the ADP part of speech, you should use the rule I wrote above with the REGEX bit.