Search code examples
regexquotation-marks

python: Removing all kinds of quotation marks


I have the following string:

txt="Daniel's car é à muito esperto"

I am trying to remove all kinds of quotation marks.

I tried:

txt=re.sub(r"\u0022\u201C\u201D\u0027\u2019\u2018\u2019\u0060\u00B4\'\"", ' ', txt)

I expected:

"Daniel s car é à muito esperto"

but actually nothing is happening.


Solution

  • The reason that the regex does not work is that it matches only a single string

    r"\u0022\u201C\u201D\u0027\u2019\u2018\u2019\u0060\u00B4\'\""
    

    To fix that one could use either alteration between each character or a character set.

    txt=re.sub(r"[\u0022\u201C\u201D\u0027\u2019\u2018\u2019\u0060\u00B4\'\"]", ' ', txt)
    

    One might need to pass the re.UNICODE flag. Untested.