Search code examples
pythonregexstringpython-re

Removing numbers and _ symbol from a parsed URL using re module and sub() function in Python


I'm trying to exclude "numbers" and the symbols "-" and "_" from a string that I got parsing a URL.

For example,

string1 = 'historical-fiction_4'
string_cleaned = re.sub("[^a-z]", "", string1)
print(string1)
print(string_cleaned)

historical-fiction_4
historicalfiction

With re.sub("[^a-z]") I got just the strings from a to z but instead of getting the string "historicalfiction" I would like to get "Historical Fiction".

More or less all my data is collected with this structure "name1-name2_number".

If anyone can help me improve my re.sub() call I'll really appreciate. Thanks a lot!


Solution

  • You can use str.title() to capitalize every word:

    import re
    
    string1 = "historical-fiction_4"
    
    string1 = re.sub(r"[^a-z]", " ", string1).strip().title()
    print(string1)
    

    Prints:

    Historical Fiction