Search code examples
pythonsplitpython-re

The result list contains single spaces when splitting a string with re.split("( )+") – is there a better way?


I have the output of a command in tabular form. I'm parsing this output from a result file and storing it in a string. Each element in one row is separated by one or more whitespace characters, thus I'm using regular expressions to match 1 or more spaces and split it. However, a space is being inserted between every element:

>>> str1 = "a    b     c      d"  # spaces are irregular
>>> str1
'a    b     c      d'
>>> str2 = re.split("( )+", str1)
>>> str2
['a', ' ', 'b', ' ', 'c', ' ', 'd']  # 1 space element between!

Is there a better way to do this?

After each split str2 is appended to a list.


Solution

  • By using (,), you are capturing the group, if you simply remove them you will not have this problem.

    >>> str1 = "a    b     c      d"
    >>> re.split(" +", str1)
    ['a', 'b', 'c', 'd']
    

    However there is no need for regex, str.split without any delimiter specified will split this by whitespace for you. This would be the best way in this case.

    >>> str1.split()
    ['a', 'b', 'c', 'd']
    

    If you really wanted regex you can use this ('\s' represents whitespace and it's clearer):

    >>> re.split("\s+", str1)
    ['a', 'b', 'c', 'd']
    

    or you can find all non-whitespace characters

    >>> re.findall(r'\S+',str1)
    ['a', 'b', 'c', 'd']