I hope that you can help me to extract numbers from strings.
I get a string in two possible ways:
This example lists the possible variants for x (x1 - x7),
I need in the end the extracted number:
=> "60" in this case (exeption x3 = 50)
I try to use regex split und strip function. But unfortunately it doesn't work for all variants:
What do I have to change so that it works for all variants?
import re
b=[]
y="60"
# x-x2: split and strip function is working => b="60"
x = "5: 60 USD"
x1= "5. $60 USD"
x2= "5- 60 USD"
# x3-x7: split and strip function is NOT working
x3 ="5: 50 USD"
x4 ="5 : 60 USD"
x5 ="5 . 60 USD"
x6 ="5 - $60 USD"
x7 ="5:$60 USD"
a,b = re.split('5: |5. |5-',x)
b = b.upper().strip(' -§$%&€ABCDEFGHIJKLMNOPQRSTUVWXYZ:')
print(b)
#b should be 60 each time (exeption x3 = 50)
import re
x = re.sub('[^0-9][.]{0,1}[^0-9]', " ", x)
x = re.sub('USD', "", x)
try:
b = x.split()[1]
except:
b = ".".join(x.split(".")[1:])
Full code:
import re
b=[]
y="60"
# x-x2: split and strip function is working => b="60"
x0 = "5: 60 USD"
x1= "5. $60 USD"
x2= "5- 60 USD"
# x3-x7: split and strip function is NOT working
x3 ="5: 50 USD"
x4 ="5 : 60 USD"
x5 ="5 . 60 USD"
x6 ="5 - $60 USD"
x7 ="5:$60.000 USD"
x_list = [x0,x1,x2,x3,x4,x5,x6,x7]
for x in x_list:
print ("raw "+x)
x = re.sub('[^0-9][.]{0,1}[^0-9]', " ", x)
b = x.split()[1]
print ("clean "+b)
Output:
raw 5: 60 USD
clean 60
raw 5. $60 USD
clean $60
raw 5- 60 USD
clean 60
raw 5: 50 USD
clean 50
raw 5 : 60 USD
clean 60
raw 5 . 60 USD
clean 60
raw 5 - $60 USD
clean 60
raw 5:$60.000 USD
clean 60.000