Search code examples
pythonregexwhitespacesymbolstokenize

Split string with regex by new lines, symbols and withspaces in python


I'm new to regex library, and I'm trying to make from a text like this

"""constructor SquareGame new(){
let square=square;
}"""

This outputs a list:

['constructor', 'SquareGame', 'new', '(', ')', '{', '\n', 'let', 'square', '=',  'square', ';', '}']

I need to create a list of tokens separated by white spaces, new lines and this symbols {}()[].;,+-*/&|<>=~.

I used re.findall('[,;.()={}]+|\S+|\n', text) but seems to separate tokens by withe spaces and new lines only.


Solution

  • This regex will only match based on the characters that you indicated and I think this is a safer method.

    >>> re.findall(r"\w+|[{}()\[\].;,+\-*/&|<>=~\n]", text)
    ['constructor', 'SquareGame', 'new', '(', ')', '{', '\n', 'let', 'square', '=', 'square', ';', '\n', '}'