Search code examples
pythonstringpython-re

Saving apostrophes in re.sub in Python


I want to save just normal letters and apostrophes with re.sub command in Python, however right now my code removes apostrophes so don't becomes dont etc. Can i add a "save" of apostrophes to my re.sub command or do I have to use some other solution?

My code right now:

text = open("songs/"+artist+"/"+album+"/"+song, "r", encoding="latin-1")
lines = text.readlines()
for line in lines:
    line = line.lower()
    line = re.sub('[^a-z ]', '', line)
    words = line.split(" ")

Solution

  • The code

    re.sub('[^a-z ]', '', line)
    

    is taking all characters that are not (^) either lowercase a-z, or space , and removing them (by replacing them with '')

    You want to add apostrophes to the list of characters that are preserved. In order to do so, you can either escape the single-quote/apostrophe character in your regex:

    re.sub('[^a-z \']', '', line)
    

    or use double-quotes in the string for your regex:

    re.sub("[^a-z ']", '', line)
    

    separate comment

    By the way, a modern way of filling in a string with variables is with an f-string (documentation). Instead of

    "songs/"+artist+"/"+album+"/"+song
    

    you can use

    f"songs/{artist}/{album}/{song}"