Search code examples
pythonregexsplitpython-re

Text is split depending on the order of specific delimiter


The code is supposed to split the string without removing the delimiters.

import re
operations = '8-8/84'
operations = re.split(r'([+,*,/,-])', operations)

Executing the code, operations ends up with this value:

['8', '-', '8', '/', '84']

But if instead of ending the delimiters [+,*,/,-] with '-', you end it with any other delimiter, the program will ignore the '-' delimiter. With:

import re
operations = '8-8/84'
operations = re.split(r'([+,*,-,/])', operations)

the final value of 'operations' will be:

['8-8', '/', '84']

Why does this only occur with '-'? I made sure that the '-' in the delimiters is the same as the '-' in the initial value of operations by copy and pasting. Using Python 3.


Solution

  • Within a character class (i.e [...]) the hyphen signifies a range of characters. Commonly people use [a-z] to mean all 26 lower case letters. The class [,-,] means all characters between , and , which is the same as just the comma.

    The class [+,*,/,-] is equivalent to [+,*/-] as there are multiple occurrences of the same character.

    The class [+,*,-,/] is equivalent to [+,*/].

    To include an explicit hyphen within a character class it must be either the first or the last in the class, or escaped with a preceding backslash. Thus to add a hyphen into [+,*/] use either [-+,*/] or [+,*/-] or [+,\-*/].