I want to clean a string of range of price 'GBP 10,000,000 – GBP 15,000,000'
and remove the currency GBP
and replace the dash(-) with a comma(,) using Regex in Python.
The output I want is (10000000,15000000)
.
This is what I tried:
re.sub('[GBP,/s-]','', text)
which produces the output ' 10000000 – 15000000'
I also would like to get rid of the leading and trailing whitespaces while replacing the dash(-) with a comma(,) to produce the output of a tuple (10000000,15000000)
Using re.sub
with a callback function we can try:
inp = "GBP 10,000,000 – GBP 15,000,000"
output = re.sub(r'[A-Z]{3} (\d{1,3}(?:,\d{3})*) – [A-Z]{3} (\d{1,3}(?:,\d{3})*)', lambda m: '(' + m.group(1).replace(',', '') + ',' + m.group(2).replace(',', '') + ')', inp)
print(output) # (10000000,15000000)
If you want an actual list/tuple of matches, then I suggest using re.findall
:
inp = "GBP 10,000,000 – GBP 15,000,000"
output = [x.replace(',', '') for x in re.findall(r'[A-Z]{3} (\d{1,3}(?:,\d{3})*)', inp)]
print(output) # ['10000000', '15000000']