Search code examples
pythonpython-re

Replacing commas with periods in text for decimal numbers (python)?


  1. I have a dataset, its field contains text information (there are both words and numeric data). As you can see in the screenshot, there are decimal numbers. They are separated by commas, and I need to make sure that there are periods between them.

enter image description here

I have previously tried writing a regex, but it replaces all commas in the text with periods.

Data_preprocessing['tweet_without_stopwords'] = Data_preprocessing['tweet_without_stopwords'].apply(lambda x: re.sub(",",'.', str(x)))

How do I write a regex so that it only works for decimal notations of a number? That is, I want an expression in the text of the form: number,number it was like this number.number in text.

  1. Example broke the data
Data_preprocessing['tweet_without_stopwords'] = Data_preprocessing['tweet_without_stopwords'].apply(lambda x: re.sub("(\d*)\.(\d*)","\1,\2", str(x)))

enter image description here

Squares appeared :D

3.

Data_preprocessing['tweet_without_stopwords'] = Data_preprocessing['tweet_without_stopwords'].apply(lambda x: re.sub("(\d+)\,(\d+)","\1.\2", str(x)))

Result agan enter image description here


Solution

  • The regex you need is "(\d+),(\d+)" to "\1.\2". Decomposition:

    (\d+)       at least one digit (group 1)
    ,           a literal ,
    (\d+)       at least one digit (group 2)
    

    replace

    \1         group 1
    .          a period
    \2         group 2
    

    Applied to your code, the relevant section would be

    lambda x: re.sub(r"(\d+),(\d+)",r"\1.\2", str(x))
    

    Here's a testbed that verifies this regex is correct