Search code examples
pythonpython-re

re.findall() function python


Can you please help me to understand the following line of the code:

import re 
a= re.findall('[А-Яа-я-\s]+', string)

I am a bit confused with the pattern that has to be found in the string. Particularly, a string should start with A and end with any string in-between A and я, should be separated by - and space, but what does the second term Яа stand for?


Solution

  • [         ]      any of the characters in here
     А-Я             any character from А and Я, inclusive
        а-я          any character between а and я, inclusive
           -         the character -   (this is ambiguous; it should only be at the very start or end of the class)
            \s       any whitespace character
               +     at least one of the preceding class of characters
    
    [А-Яа-я-\s]+     at least one character between А and Я (uppercase or lowercase), a dash, or whitespace
    

    the [] is called a "class" in regex, and it's basically meant to say "any of the characters inside here is valid". And then + means "at least one occurrence of the preceding character/class". Python has a Regular Expressions HowTo that you might find useful to read through.