Search code examples
pythonregexstringsubstringpython-re

How to delete the numbers between two delimiters?


I have some garbage data:

trueText = ' 23  Wolkenvelden en   lokaal wat regen. In  de ochtend op  steeds meer  plaatsen droog en  24 zon. In de avond   kans op onweer,met   name in Zeeland.   22      20 23   = max. temp.  vandaag   '

I want to delete the numbers that are between the characters because this is useless. Sometimes there can be a number in the text so that's why I only want to delete those between the characters.

I have tried some things myself:

trueText = re.sub('[^]+', ' ', trueText)

This deletes everything between the characters. I think I have to use the \d sequence but I can't seem to get the syntax right.


Solution

  • You can remove all digits in the match value using

    trueText = re.sub('[^]+', lambda x: ''.join(c for c in x.group() if not c.isdigit()), trueText)
    

    See the Python demo:

    import re
    trueText = ' 23  Wolkenvelden en   lokaal wat regen. In  de ochtend op  steeds meer  plaatsen droog en  24 zon. In de avond   kans op onweer,met   name in Zeeland.   22      20 23   = max. temp.  vandaag   '
    print(re.sub('[^]+', lambda x: ''.join(c for c in x.group() if not c.isdigit()), trueText))
    

    Output:

       Wolkenvelden en   lokaal wat regen. In  de ochtend op  steeds meer  plaatsen droog en   zon. In de avond   kans op onweer,met   name in Zeeland.             = max. temp.  vandaag   