Search code examples
pythonsplitspacespacespython-re

why is my python re pattern not working for splitting at spaces?


im trying to split text at all punctuation for english and russian. this works except for with spaces. for some reason \s is not working. allRussianWords ends up containing spaces but I do not want it to. allRussianWords = re.split("[—…();«»!?.:,%\s\n]",words)

this is the string that i am attempting to split words = "привет, моё имя Мэтт. Как ты?" the punctuation is in russian


Solution

  • Seems like you need a + after the closing square bracket, to match consecutive characters. One of the other answers points this out, too.

    The \n is also redundant, as \s contains the line return character.