Search code examples
regexpython-3.xnlpnumeric-ranges

Numeric Ranges with a Regular Expression python


So I am working on a text analytics problem and I am trying to remove all the numbers between 0 and 999 with regular expression in Python. I have tried Regex Numeric Range Generator to get the regular expression but I didn't succed. I can only remove all the numbers.

I have tried several Regex but it didn't work. here's what I tried

# Remove numbers starting from 0 ==> 999
data_to_clean = re.sub('[^[0-9]{1,3}$]', ' ', data_to_clean)

I have tried this also:

# Remove numbers starting from 0 ==> 999
data_to_clean = re.sub('\b([0-9]|[1-8][0-9]|9[0-9]|[1-8][0-9]{2}|9[0-8][0-9]|99[0-9])\b', ' ', data_to_clean)  

this one:

^([0-9]|[1-8][0-9]|9[0-9]|[1-8][0-9]{2}|9[0-8][0-9]|99[0-9])$

and this:

def clean_data(data_to_clean):
    # Remove numbers starting from 0 ==> 999
    data_to_clean = re.sub('[^[0-9]{1,3}$]', ' ', data_to_clean)  
    return data_to_clean

I have a lot of numbers but I need to delete just the ones under 3 decimals and keep the other.

Thank You for your help


Solution

  • You need precede the pattern string with an r to prevent escaping so the interpeter won't swap \b with a backspace. Plus you can simplify the pattern like this:

    data_to_clean = re.sub(r'\b([0-9]|[1-9][0-9]{1,2})\b', ' ', data_to_clean)