Search code examples
pythonsplitspecial-charactersstrip

How can I extract a part(number) of a string which contains special characters


I hope that you can help me to extract numbers from strings.

I get a string in two possible ways:

  1. y = required number e.g. "60"
  2. x = required number in a string with special characters and possibly another number at the beginning

This example lists the possible variants for x (x1 - x7),

I need in the end the extracted number:

=> "60" in this case (exeption x3 = 50)

I try to use regex split und strip function. But unfortunately it doesn't work for all variants:

What do I have to change so that it works for all variants?

import re

b=[]
y="60"

# x-x2: split and strip function is working => b="60"
x = "5: 60 USD"
x1= "5. $60 USD"
x2= "5- 60 USD"

# x3-x7: split and strip function is NOT working
x3 ="5: 50 USD"
x4 ="5 : 60 USD"
x5 ="5 . 60 USD"
x6 ="5 - $60 USD"
x7 ="5:$60 USD"


a,b = re.split('5: |5. |5-',x)

b = b.upper().strip(' -§$%&€ABCDEFGHIJKLMNOPQRSTUVWXYZ:')

print(b)

#b should be 60 each time (exeption x3 = 50)

Solution

  • import re
    
    
    x = re.sub('[^0-9][.]{0,1}[^0-9]', " ", x)
    x = re.sub('USD', "", x)
    
    try:
        b = x.split()[1]
    except:
        b = ".".join(x.split(".")[1:])
    
     
    

    Full code:

    import re
    
    b=[]
    y="60"
    
    # x-x2: split and strip function is working => b="60"
    x0 = "5: 60 USD"
    x1= "5. $60 USD"
    x2= "5- 60 USD"
    
    # x3-x7: split and strip function is NOT working
    x3 ="5: 50 USD"
    x4 ="5 : 60 USD"
    x5 ="5 . 60 USD"
    x6 ="5 - $60 USD"
    x7 ="5:$60.000 USD"
    
    x_list = [x0,x1,x2,x3,x4,x5,x6,x7]
    
    for x in x_list:
    
        print ("raw "+x)
    
        x = re.sub('[^0-9][.]{0,1}[^0-9]', " ", x)
    
        b = x.split()[1]
    
        print ("clean "+b)
    

    Output:

    raw 5: 60 USD
    clean 60
    raw 5. $60 USD
    clean $60
    raw 5- 60 USD
    clean 60
    raw 5: 50 USD
    clean 50
    raw 5 : 60 USD
    clean 60
    raw 5 . 60 USD
    clean 60
    raw 5 - $60 USD
    clean 60
    raw 5:$60.000 USD
    clean 60.000