Search code examples
pythonregexstring-formattingtext-alignment

Python regex to globally replace trailing zeros with spaces


As a workaround to align floats to decimal separator for tabular numeric data, I tried to find a regex to replace (globally a posteriori) trailing zeros with spaces, with the following rules:

  1. no trailing zeros after a decimal digit
  2. if the first digit after decimal separator is zero, keep it

Due also to Python regex engine limitation on look-behind requiring fixed-width pattern, I wasn't able to find a satisfactory solution. Here is a working example of my tries (Python 3.x); do not rely on vertical bars in your solution, they are in the example just for clarity purpose:

import re
# formatmany is just a way to speed up building of multiline string of tabular data
formatmany=lambda f:lambda *s:'\n'.join(f.format(*x) for x in s)

my_list = [[12345, 12.345, 12.345, 12.345],
           [12340, 12.34 , 12.34 , 12.34 ],
           [12345, 12.005, 12.005, 12.005],
           [12340, 12.04 , 12.04 , 12.04 ],
           [12300, 12.3  , 12.3  , 12.3  ],
           [12000, 12.0  , 12.0  , 12    ]]
my_format = formatmany('|{:8d}|{:8.2f}|{:8.3f}|{:8.4f}|')
my_string = my_format(*my_list) # this is the formatted multiline string with trailing zeros

print('\nOriginal string:\n')
print(my_string)
print('\nTry 1:\n')
print(re.sub(r'(?<!\.)0+(?=[^0-9\.]|$)',lambda m:' '*len(m.group()),my_string))
print('\nTry 2:\n')
print(re.sub(r'(\d)0+(?=[^\d]|$)',r'\1',my_string))

which prints

Original string:

|   12345|   12.35|  12.345| 12.3450|
|   12340|   12.34|  12.340| 12.3400|
|   12345|   12.01|  12.005| 12.0050|
|   12340|   12.04|  12.040| 12.0400|
|   12300|   12.30|  12.300| 12.3000|
|   12000|   12.00|  12.000| 12.0000|

Try 1:

|   12345|   12.35|  12.345| 12.345 |
|   1234 |   12.34|  12.34 | 12.34  |
|   12345|   12.01|  12.005| 12.005 |
|   1234 |   12.04|  12.04 | 12.04  |
|   123  |   12.3 |  12.3  | 12.3   |
|   12   |   12.0 |  12.0  | 12.0   |

Try 2:

|   12345|   12.35|  12.345| 12.345|
|   1234|   12.34|  12.34| 12.34|
|   12345|   12.01|  12.005| 12.005|
|   1234|   12.04|  12.04| 12.04|
|   123|   12.3|  12.3| 12.3|
|   12|   12.0|  12.0| 12.0|

Try 1 replace trailing zeros also in integers, try 2 was taken from another solution for replacing trailing zeros in a single float. Both are unsatisfactory, since the desired output should be:

|   12345|   12.35|  12.345| 12.345 |
|   12340|   12.34|  12.34 | 12.34  |
|   12345|   12.01|  12.005| 12.005 |
|   12340|   12.04|  12.04 | 12.04  |
|   12300|   12.3 |  12.3  | 12.3   |
|   12000|   12.0 |  12.0  | 12.0   |

Why this is not a duplicate question

  1. Python regex engine is slightly different from other languages engines, therefore solutions given for other languages do not automatically apply
  2. Trailing zeros are to be replaced, not stripped
  3. This is about global replacement of many occurrencies in a multiline string, not just a single occurrency

Solution

  • stribizhev's (previous but not completely satisfactory) answer gave me the idea to get to a general solution:

    re.sub(r'(?<=\.)(\d+?)(0+)(?=[^\d]|$)',lambda m:m.group(1)+' '*len(m.group(2))